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Introduction 


Robert Dreeben and J. Alan Thomas 
University of Chicago 


The papers presented in this collection represent, we 

believe, the emerging second generation of investigations 

on educational effects. While they are not themselves 
empirical works, they reexamine the existing empirical and “con- 
ceptual literature in ways that set a new agenda for understanding 
how educational institutions operate. 

The first generation of investigations consisted of three basic 
types. The first are those commonly considered to be “production 
function" studies—in a loose and metaphorical sense of that term— 
of which the Coleman Report (Coleman et al., 1966) is surely the 
most widely known and is also perhaps prototypical. The Coleman 
Report is a school effects study that compares the characteristics 
of schools and treats the relationship between variation in those 
characteristics and variation in educational outcomes (aggregate 
levels of achievement). Schools are characterized by the composition 
of the teaching force, of the student body, and of the community; 
and these three elements are treated as stocks of resources repre- 
senting factors of educational production. While the Coleman Report 
concerns itself with schools, other investigations treat school districts 
or classrooms as centers of production. All of these investigations 
seek to identify educational resources whose variations will account 
for differences in educational achievement. 

The second type are commonly known as "status attainment" 
studies, some of which show how aspects of schooling influence 
levels of and changes in achievement of individuals in schools, while 
others show how they influence the life chances of individuals dur- 
ing their postsecondary and posttertiary school years. At issue here 
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are such matters as school track, teacher quality, years of schooling 
(examined in combination 


in this context must not be confused with School or educational 
effects in the previous context. 

Finally, there are studies of variations in adult knowledge and 
moral conviction that are attributable to different amounts of 


Schooling uring youth. This approach treats only the long-term 


Within school problems as if they were status attainment problems. 
A case in point are the studies of "tracking," 
Studies of tracks, but of st 
Nevertheless, the issues o 
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exception to this statement is the work on aptitude-treatment 
interaction.) 

The second generation of educational effects studies addresses a 
variety of questions left unexamined by the first generation, and it is 
to these questions that the authors of the works included in this 
collection speak most centrally. As we indicated earlier, no one who 
was knowledgeable about schools seriously believed that all students 
in a class or a school were actually treated alike. However, because 
of artifacts related to high levels of aggregation, students within 
schools were treated statistically alike. It is one thing to say that they 
should not have been so treated; it is another to think conceptually 
and empirically about how they should be treated differently. Some 
of these papers address the latter point. 

To understand educational effects, one needs to discover directly 
what kinds and amounts of instructional resources are made avail- 
able both to groups of students within schools and within classes 
and to individual students. One needs to know whether certain cate- 
gories of students receive different allocations of resources according 
to their aptitudes, race, sex, or social class, for example. While in- 
ternal variations in student characteristics, related to variable alloca- 
tions of resources, have been an acknowledged but unaddressed 
problem, other problems have scarcely been acknowledged until 
very recently. First generation research was preoccupied with stocks 
of resources; it did not address itself explicitly to their allocation 
and utilization. If a school buys books, but the books stay on a shelf 
unread, the number of books measures school wealth, not resources 
brought to bear for instruction. Moreover, resources are not simply 
particles that can be assigned homogeneously or heterogeneously to 
different sorts of students. The allocation of resources can be under- 
stood in terms of the patterns of instructional groups, of different 
educational activities, of different kinds of learning materials, and of 
different formats (whole class and small group instruction, seat- 
Work, and various combinations). These facilities ап аш Ripe, 
constitute processes of resource allocation. They б we oa 
&ories of educational production. In short, the papers em к: ш Ms 
Volume indicate that educational production is comprise a various 
processes and forms of resource allocation. They T hrs ie д 
another aspect of educational technology and to si e nq ee 
à perspective on school operation n views it rather n y 
statically in terms of resource availability. | е 

Much of the first generation work on educational protic tt Te 
о рад by the tast thar a ae p constrained by the 
Or one at a time) was considered. It was 
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fact that cross-sectional surveys made it impossible to think about 
Schooling as a value-added phenomenon occurring over time. While 
these limitations and their remedies are obvious in principle, the 
remedies are not so easy to effect. To study multiple outcomes 
means understanding both the relationships among the outcomes and 
those between several outcomes and multiple inputs. And longi- 
tudinal research raises some baffling problems of just how to measure 
change. Second generation research, then, confronts not 
variations on familiar themes, b 
born methodological and conceptual issues. 


rooms exist in schools, that sch 
gradations of organization 
layered, nested structure. M 


Below we discuss 
authors in this volume 
offers. Despite diffi 


i. Сз Д the key aspects of the study of school 
eitects—namely, conceptualizing and measuri 
identifying and measuring thos us nu d 
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learning, and (3) examining the processes by which school character- 
istics are brought to bear on the learning of students. Since these 
topies are not mutually exclusive, some themes will recur in the 
following pages. 


THE MEASUREMENT 
OF EDUCATIONAL OUTCOMES 


Few of the first generational researchers were trained in the com- 
plex field of psychometrics, and it is not surprising that much of 
their work is vulnerable to criticism on the grounds that outputs are 
improperly conceptualized and somewhat naively measured. The 
authors in this volume who deal with this issue are especially con- 
cerned with the validity of test instruments, the measurement of 
gains in performance, and the conceptualizing of situations in which 
multiple outputs are jointly produced. 

In the case of large-scale studies using aggregated data, validity 
issues are relatively unimportant. Since achievement tests are val- 
idated on the basis of large populations, that no particular class- 
room teaches the precise content of the tests is relatively unimportant. 
(In fact, one danger in the use of these tests for evaluative purposes 
is that some educators may provide instruction based on the precise 
content of the tests, thus violating assumptions about the distribu- 
tion of scores.) However, if standardized tests are used as a method 
of evaluating the effectiveness of specific classroom procedures, the 
problem of content validity assumes considerable importance, since 
such tests do not reflect the content taught by any particular teacher. 
David Berliner points out that there is wide variation among class- 
rooms in the amount of time devoted to specific aspects of the cur- 
riculum, and he suggests that these variations will probably lead to 
differences in performance in achievement test items that reflect 
these content areas. Therefore, while standardized achievement 
tests may be highly reliable, they lack, in Berliner's terms, “content 
validity at the classroom level." 1 

This issue poses severe problems for the second generation re- 
searcher who is concerned with classroom effects. One solution may 
be to conduct experimental studies using criterion-referenced tests 
that are specifically designed to measure the skills included in the ex- 
periments. However, researchers who prefer naturalistic to experi- 
mental settings are faced with the problems that (1) existing achieve- 
ment tests do not reflect what is taught in a cross-section of classrooms 
and hence cannot be used to measure the effects of classroom 
processes and (2) no single criterion-referenced test can encompass 
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the content taught in a variety of classrooms, especially if they are 
purposely chosen to reflect the diversity that exists among school 
Systems. 

Even more problematic than the issue of content validity is that 
Second generational school effects studies must deal with the rela- 
tionship between school processes and learning gains. Previous 
studies that relied on cross-sectional data ignored this issue; however, 
identifying the effects of specific classroom procedures or curricula 


necessitates measuring the performance increment associated with 
these alternatives. 


epresents the “same 
ints elsewhere in the 
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fill their future roles as members of the nation's productive system. 
Jointness refers in part to Berliner's concept of the adjunct curricu- 
lum, but it is generalizable to a wider range of schooling outcomes 
than the term adjunct curriculum implies. 

Teachers or schools may therefore be consciously or unconsciously 
interested in a wide variety of goals, not all of which are measured by 
standard achievement test batteries. Furthermore, school resources 
are being utilized to produce several kinds of learning at a given time. 
Unless this complexity is recognized, classrooms or schools may 
appear to be more or less efficient than they in fact are. 

Chapter 1 (Heyns), Chapter 2 (Brown and Saks), and Chapter 
4 (Berliner), deal in greater or lesser degree with issues of educa- 
tional measurement. They highlight a number of complexities 
that have not, for the most part, been explicitly recognized by first 
generational research on educational production. We turn now to a 
discussion of the issues in the measurement and conceptualization 
of those school characteristics believed to affect learning. 


IDENTIFYING AND MEASURING 
SCHOOL CHARACTERISTICS 


The clearest difference between first and second generational studies 
of the effects of schooling lies in the conceptualization of relevant 
school characteristics. First generational studies dealt primarily with 
stocks of resources such as the number of books in the library, while 
the second generational emphasis is on resource flows to students, 
the behavior of teachers rather than their demographic character- 
istics, and microlevel processes. Furthermore, second generational 
studies are almost unanimous in regarding students’ time as an im- 
portant input in schooling. 

Some of the reasons for this shift are almost self-evident. For 
example, such characteristics of teachers as their age, length of ex- 
perience, and college degree have no logical relationship with their 
classroom performance. Furthermore, there are dangers of mis- 
defining the direction of causality. If, as seems likely, the assignment 
of teachers to schools and to classrooms is, to a degree, a result of 
administrators’ and communities’ matching the characteristics of 
teachers with those of students, students’ learning could determine 
teacher characteristics rather than the reverse. The strongest reason 
for including such variables as the number of books in the school 
library, the experience and training of teachers, and class size as 
school effect variables is that all these measures affect cost and are 
therefore directly related to issues in resource allocation; but recent 
research is dedicated to determining more precisely the relationship 
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between resource allocation and student learning and therefore con- 
centrates on the manner in which resource inputs become embedded 


in classroom processes, instructional formats, and classroom or- 
ganizational procedures. 


The manner in which educational expenditures are apportioned 
among students in a class is partially dependent on teaching strate- 


cate resources among students according to their own judgments 
and values, 


Thus, one major emphasis of Second generational studies of school 
effects is on what teach 


ers do rather than on thei i 
characteristics, A Second j i й rk че 
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wide difference between the amount of time made available to 
students and the time during which they are actually engaged in ed- 
ucational activities. 

While Wiley's (1976) earlier study was based on the length of the 
school day, Berliner points to wide differences among classrooms in 
the amount of time devoted within the day to specific content areas 
in reading and mathematics. For example, in a period of about 
ninety days, one class devoted 400 minutes to linear measurement 
while another class spent only 29 minutes on this topic. Similarly, 
one class spent 573 minutes on creative writing, and another class 
spent only 56 minutes on this topic. Berliner infers that these differ- 
ences in time allocation will result in differences in average attain- 
ment in these topics. He also points out that these differences set 
limits on the usefulness of standardized tests, because ''using stan- 
dardized tests as outcome measures cannot be defended unless 
natural variation in choice of content and time allocated to content 
areas of the curriculum are experimentally controlled." 

Harnischfeger and Wiley have developed an accounting system for 
categorizing the various uses of students! time. They begin by point- 
ing out that a student's active learning time is less than the nominal 
quantity of time made available by states and school districts, be- 
cause the former is affected by loss of time due to illnesses and to 
School closings because of strikes and adverse weather. Active learn- 
ing time is the quantity of time available for a particular student: 
this time is divided among various activities such as whole class time, 
teacher-supervised subgroup time, transition time, and so on. They 
conclude that studying the time allocations to various learning 
settings within the total curriculum provides a framework within 
which pupil learning, teacher effectiveness, and resource allocation 


may be studied. 

Brown and Saks make an i 1 tior 
ination of the complexities of the allocation, within cl | 
time and other resources. They suggest that classroom technologies 
are analogous, not to the assembly line, but to the job shop in which 
various kinds and quantities of both inputs and outputs are used and 
in which the process of transforming inputs into outputs differs for 
each unit of output. The relationship. of students' time to learning 
various subjects differs in terms of individual abilities and the nature 
Of each content area. Students are influenced in their allocation of 
time and effort by the incentives made available to them. Brown 
and Saks give substantial attention to the manner in which specific 
types of incentives can be expected to influence students who 


о i t characteristics. | 
i To йске, Сыер! 2 (Brown and Saks), Chapter 4 (Berliner), 


nteresting contribution to the examin- 
assrooms, of 
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and Chapter 5 (Harnischfeger and Wiley) treat questions of the actual 
allocation of resources in the production of education. They give 
special attention to the behavior of teachers and not merely, as is 
typical of first generational researchers, to the characteristics of 
teachers. All three also deal with the manner in which students' 
time is utilized in the production of learning. The next section deals 
with those aspects of the essays that are devoted to an analysis of 
what goes on in classrooms during the transformation of inputs into 
outputs, as well as to the implications of this analysis for the sta- 
tistical treatment of data. 


THE ANALYSIS 
OF CLASSROOM PROCESSES 


f a “public good.” Hence, to the degree that 
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from those available to others. This “зерагаб у" model is closer 
to the assumptions made by Harnischfeger and Wiley and leads to the 
notion that group size is a major determinant of the costs that can 
be allocated to individual students. 

Another issue is that material as well as human resources are in- 
volved in the production of learning. Although the hourly costs of 
such resources as books, individualized instructional materials, audio- 
visual equipment, and even space are low compared with the cost of 
teachers’ services, the use of these materials is important in the con- 
sideration of alternative teaching strategies. While the virtual im- 
possibility of comparing the marginal product of human and 
material resources precludes determining the marginal effect of 
various input combinations on specific outputs, it is possible that 
various inputs have specific and unique effects on learning. For 
example, the ability of students to substitute material resources for 
human resources may be a function of students' learning ability: 
high ability students may be more able than those of lesser ability 
to substitute, for example, books for teachers' services in the pro- 
duction of leaming. In this case, the relegation by Harnischfeger and 
Wiley of “monitoring” to a relatively minor role may be unjustified, 
since with some students and some content areas, teachers may 
accomplish their task equally well by managing students' use of 
materials as by utilizing verbal behavior to produce learning; hence, a 
high reliance on lecturing and discussion as the main mode of in- 
struction may, for some students, be less efficient than utilizing a 
variety of methods that include the use of books and prepared seat- 
work. This may be especially true if a goal of instruction is to de- 
velop the student's ability to produce learning independently of 
adult help. 

An important characteristic of second generational studies of 
school effects is the new awareness that schooling is a multilevel 
process, involving individual students, classrooms, schools, school 
districts, and higher levels of government. Earlier studies often 
attempted to predict individual level performance on the basis 
of macrolevel data. A prime example of this type of error is making 
inferences about the distribution of educational outcomes on the 
basis of knowledge about district level expenditures per pupil; 
However, expenditures differ across school districts, across schools 
within districts, across classrooms within schools, and across students 
within classrooms. | M" 

It is widely accepted that the learning of individual students is 
affected by their own characteristics, the structural characteristics 
of the classroom and school in which they are located, and the 
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characteristics of their peers. In addition, the amount of money 
made available at the district level provides resources that in turn 
represent opportunities to learn, while also setting constraints on 
learning. A student's learning may be affected by his relative standing 
within his classroom or within a specific set of peers. Thus, a student 
who achieves at, say, the 5.5 grade level may be in a different learn- 
ing environment if he is in a class in which the average achievement is 
at the 6.5 grade level than if he is in a class where the average 
achievement is at the 4.5 grade level. 

However, the multilevel view of learning should not be limited to 
examining the effects of higher level processes on individuals, since 
the characteristics of students may also affect the structuring of 
classroom activities, the kinds of teachers and principals who are 
hired, and the content of the curriculum. Teachers' behavior may be 
partially affected by the mean, variance, and skewness of the ability 
distribution of students, while a school districts budgetary policy 
may be influenced by the characteristics of students. For example, 
if a large portion of the student body in a specific attendance area 
do not aspire to go to college, there may be pressures on the district 
to establish a vocational program; curricular specialization at the 
secondary school level has extremely important budgetary implica- 
tions, since vocational schools are, by and large, more expensive to 
operate than academic high schools. 

Burstein (Chapter 3) carefully sorts out the various types of 
effect and points out the dangers inherent in misspecifying the 
nature of cross-level influence on learning. We believe that this kind 
of analysis will be essential reading for researchers who wish to ex- 
pand knowledge about school effects. Equally important, this 
methodological discussion provides insights into the nature of a 
hierarchically structured organization where events occurring at one 
level affect those Occurring at others. 
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SE ^ Chapter1 


Models and Measurement for the 
Study of Cognitive Growth * 


Barbara Heyns 
University of California, Berkeley 


INTRODUCTION 


The purpose of this study is threefold. First, I will docu- 

ment the pitfalls and empirical anomalies uncovered in 

making inferences about cognitive growth from longitud- 
inal test score data. Second, I will endeavor to account for the 
empirical patterns in a plausible manner, using assumptions and 
models different from those current in the psychometric literature. 
Finally, I hope to link issues of measurement and measurement 
error to substantive concerns in sociology and to suggest ways in 
which one might assess learning. The study is intended to be both 
a relatively nontechnical assessment of the theory and measurement 
models associated with standardized testing and a cautionary tale for 
the analyst inexperienced with longitudinal data. 

The issues to be addressed arose during the course of work on 
an extensive body of longitudinal data on achievement test scores. 
Like many other educational researchers, I had become convinced 
that cross-sectional data were inadequate for testing propositions 
about learning in schools; to my mind, much of the research in 
education could and should be criticized on these grounds. One 


*Prepared for the University of Chicago Educational Finance and Produc- 
tivity Center and funded by the National Institute of Education, Grant No. 
G-74-0037. The computer simulation was completed by Richard Juster; Douglas 
Jones provided invaluable assistance with the statistical derivations found in the 
Appendix. Comments and suggestions from Educational Finance and Pro- 
ductivity Conference participants, Chicago, June 6-8, 1978, were enormously 


helpful. 
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is patently false. Classical test theory, like any other theory, involves 
assumptions that should be subjected to test; the results of such 
tests should be the basis for an appraisal of the theory. This chapter 
hopes to raise issues conducive to a critical and constructive reap- 


At the outset it is important to clarify what will not be at issue. 
Perhaps the most frequent criticism of standardized testing concerns 
the validity of measures. It is commonly asserted that neither cog- 
nitive achievement nor intelligence is unidimensional and that no 
single score, however reliable, captures the diversity of skills and 
talents in a population. While this seems self-evident, it does not 


; I accept the propositions that the achieve- 
ment tests measure important skills and that they do so with an 
acceptable degree of accuracy. 


The tests that will be used for illustration were administered 
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biannually by classroom teachers to over 1,000 fifth and sixth 
grade children in a large urban school district. Four parallel forms, 
which taken together constitute the complete intermediate battery 
of the Metropolitan Achievement Test series published by Harcourt, 
Brace, and World in 1959, were retrieved from the centralized data 
bank of the school system; test results were matched with question- 
naires from a parental survey conducted concurrently with the 
final period of testing. The sample to be analyzed is restricted to the 
145 children for whom test scores were available at all four times, on 
the most reliable subtest—word knowledge (Heyns, 1978). Each form 
of this subtest consists of fifty-five multiple-choice items. The ob- 
served scores are normally distributed; plotting adjacent time periods 
reveals substantial linearity between pretests and posttest. The re- 
liability calculated for sample children on the final form (KR-20 = 
.943) compares favorably with the published reliability of .94. Al- 
though it was not possible to do an extensive item analysis on each 
form, I am convinced that the data are of high quality and probably 
superior to most data available for educational research and evalua- 
tion.! 

The primary question is, To what extent can the results from 
these tests be used to infer the amount and pattern of cognitive 
growth during the two year interval examined? Table 1-1 presents 
the means for the national norming sample reported by the test 
publisher for the three most common metrics and the observed 
means calculated for a portion of the sample. Three observations are 
readily apparent. First, the means for both the sample and the nation 
routinely increased irrespective of metric. Statistical tests would re- 
veal that the differences were significant during both school years 
between October and May and were not significant during the inter- 
vening summer. Second, Table 1-1 reveals that the gains are not 
linear with respect to time; neither the sample children nor the 
norming population appear to have improved their vocabulary skills 
as rapidly during the sixth grade as they did during the fifth.? 
The third observation is most troubling of all: the pattern of gains 
is not consistent across metrics but shows different rates of growth 
for given increments of raw score points. 

Based on these patterns, what can be said about the rate of learn- 
ing among the sample children? One answer, which is not entirely 
satisfactory, is that gains or rates of growth differ dramatically ac- 
cording to the scaling or metrics applied. Which metric is best? The 
answer to this question is a major objective of the discussion to 
follow. One might imagine a not wholly fabricated conversation be- 
tween a naive educational researcher and an experienced analyst. 
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*Why don't the sample children increase their standard scores at 
the same rate as the norming sample? Can I conclude that the schools 
in this district are failing their pupils?" asks the naive young re- 
searcher. 

“Longitudinal test scores typically exhibit fan-spread," responds 
the experienced analyst. *As the means increase, so do the variances 
of most test scores. Your sample began the fifth grade at a disad- 
vantage; the children appear to fall progressively further behind be- 
cause the total variability of achievement was increasing steadily. 
Although it is possible that the schools are in part responsible for 
these trends, it would be wrong to conclude that they must be . 
Rates of cognitive growth differ among children; test scores tend to 
become more variable over time." 

The young researcher observes that, indeed, for each metric the 
sample variances tend to be proportional to the means; cognitive 
inequality appears to increase both between sample children and the 
nation and among children within the sample at each point of time. 
The researcher is now convinced that it does not make sense to hold 
the schools in this district accountable for the fact that children 
in the sample progressed less rapidly than the national average. In 
order to infer differential growth, one must have a comparison group 
matched by initial scores and perhaps by other characteristics as well. 
Without such a comparison, it is meaningless to ask whether a given 
group of children gained more or less than expected during a particu- 
lar interval, since one cannot assume that a single learning curve 
characterizes all children. 

"Why don't sixth grade children learn as much as fifth grader 
asks the young researcher. 

"The vocabulary test may have a ceiling, 
enced analyst. 

“But it's not just the vocabulary test," says the young researcher. 
"There appears to be a deceleration of learning over time for every 
skill tested. Does this mean that the scores for high-scoring children 
are less reliable than those for low-scoring children?" 

"Not at all" responds the psychometric expert. “The item 
analysis reveals that this test discriminates at least as well among 
high-scoring as among low-scoring children.” 

“Then why do high-scoring children learn less on the average than 
low-scoring children? The data consistently demonstrate that gains 
are inversely related to initial scores." 

“Regression effects," answers the senior analyst. “It is a common 
Observation that the scores of both high- and low-scoring children 


regress to the mean." 


s?” 


" responds the experi- 
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“Then the gains and losses should be equal," says the young re- 
searcher. ‘‘And the means should be an unbiased estimate of cog- 
nitive growth. Could it be that learning rates actually decelerate over 
time?" 

"I suspect they do," responds the experienced analyst. *I have 
long held to the theory that the innate capacity for learning de- 
clines with age. Younger children simply develop skills more quickly 
than children somewhat older, and your tests reflect this fact." 

Somewhat mollified, the young researcher returns to her computer 
printout. Cognitive growth decreases with time but not at a constant 
rate for all children since fan-spread occurs. Empirically, one should 
conceptualize learning as a decelerating function of time but with 
increasing variability among children. It all makes sense. 

Being particularly compulsive, the young researcher proceeds to 
analyze her data in terms of several different metrics; being socio- 
logically predisposed, she also examines learning rates for children 
from different backgrounds. It soon becomes apparent that the 
several metrics yield different distributions of scores and very dif- 
ferent conclusions about learning. Raw scores tend to show the 
largest gains for children from disadvantaged backgrounds, while 
grade equivalents imply that poor children learned the least during 
any particular period of time. 

“Which metric is right?" she asks. 

“They measure different things," answers the experienced analyst. 

"They seem to measure different things by national standards 
than they do in the sample," the researcher complains bitterly, 
brandishing a table not unlike Table 1-1. “A mean standard score 
of 52.7 points is equal to a grade equivalent score of 6.8 years in the 
nation; in the sample, a standard score of 50.6 is equated to 6.9 
years." 

"The reason," responds the analyst, “for these apparently baffling 
results is that the metrics you are comparing have unequal intervals. 
Grade equivalent scores are not linear transformations of either raw 
scores or standard scores. A given gain in raw scores is worth progres 
sively more in grade equivalent units as one ascends the scale. Ha 
you transformed your raw score values to grade equivalents after 
averaging, you would not have obtained the same figures." 

"But which metric is right?" reiterates the researcher. “I want tO 
determine whether students with college-educated mothers learned 
more or less than did students with less educated mothers. This 
table gives me three different answers." 

The young researcher displays the puzzling configuration of gains 


found in Table 1-2 to the veteran and asks once again, “Which 
metric gives the best estimate of change?" 
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Table 2-1. Comparison of Mean Gain Scores by Metric and Maternal Educa- 
tion, Metropolitan Achievement Test, Word Knowledge (М = 745). 


Grade 
Raw Standard Equivalent 
Mother's Education Score Score? Score? 
Total 5.5 4.0 „91 
Less than high 
school 6.6 4.0 58 
High school 
graduate 6.3 4.0 1.02 
Some 
college 5.0 4.1 1.19 


4Standard scores and grade equivalents were equated to the individual’s observed 
raw score, based on the published norms, and then averaged. 


Source: Heyns, 1976. Used by permission. 


“You really should not be using difference scores, you know.” 
Says the analyst. “Change scores are notoriously unreliable.” 

“You mean I can’t measure learning?” asks the young researcher. 

“Of course you can measure learning,” says the expert reassuringly. 
“It’s merely change which is problematic.” 

Puzzled, the young analyst then asks her colleague to explain the 
results of Table 1-2. 

“This table reveals precisely the pattern of effects we have been 
discussing. Your measure of maternal education covaries with both 
the pretest and the posttest. The inverse relationship between raw 
score gains and the covariate is an obvious example of the regression 
fallacy. Both the pretest and the posttest are measured with error, 
and since change scores embody both sources of error, they are less 
reliable than either single score. Errors of measurement frequently 
result in a negative correlation between gains and pretest scores or 
with any correlate of pretest—in this case, mother’s education.” 

“But I thought you said that the errors were random?” 

“But with fallible measures, the correlation between a pretest 
and a change score is frequently negative. You have merely demon- 


strated a variant of the familiar regression fallacy.” 

“And why do the grade equivalent scores yield a positive assoc- 
iation between cognitive growth and maternal education?” asks the 
neophyte. 

“Grade equivalent scores are not very reliable,” responds the 
senior analyst. “I have recommended that they be abolished. What 
is happening is that the high-scoring students are benefiting from 
being further up the scale; their gains in raw scores are lower, yet 
they’re credited with alarger increment of gain. The most appropriate 
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metric is the standard score, which tells you that the observed gains 
for the various levels of maternal education are the same. That is, 
students learn at a rate that is not affected by the mother's educa- 
tion. That is what you wanted to know.” 

“That is not what I wanted to know,” wails the young researcher. 
“The standard scores were constructed to have equal variances at 
each date. Why should I expect to see any change in relative po- 
sition? What I want to know is why, if fan-spread exists, shouldn’t 
one find an increasing educational gap between high and low status 
children over time? Alternatively, if learning is a decelerating func- 
tion of time, perhaps it is decelerating more rapidly for the ad- 
vantaged groups, as the raw scores suggest. I want to know which 
metric best describes cognitive growth for children from diverse 
backgrounds.” 

"My dear, you have much to learn,” says the senior analyst. With 
these words, he gives the young researcher several weighty volumes, 
a few thin articles replete with mathematical equations, and a bottle 
of aspirin. 

In time, the young researcher learns to accept the psychometric 
rationale offered and to limit her research to the most reliable 


about educational Processes: there is little differential cognitive 


growth that can be attributed to either family background or school- 
ing. 
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as “the most important technical contribution psychology has made 
to the practical guidance of human affairs" (Cronbach, 1970:197). 
In recent years, the use of standardized achievement tests for re- 
Search and evaluation has mushroomed. Almost without exception, 
Such research has sought to discover whether or not educational 
programs significantly influence the cognitive growth of students. 
The findings associated with such research, have, with startling regu- 
larity, been negative, educational programs of all types have been 
found to contribute little to the cognitive growth of children. Such 
results have in turn generated enormous criticism of the schools, 
as well as a tendency, more or less fully articulated, to regard ability 
Or achievement as an immutable trait, determined to a large degree 
by genetic factors. 

If one accepts the premise that the central questions for edu- 
cational research and evaluation revolve around learning and cog- 
nitive growth, one must ask whether the tests that enjoy such wide- 
Spread use are capable of measuring change. Although creative ex- 
planations for null findings abound, the logically prior question is 
Whether one can use tests to assess growth. Yet this fundamental 
question is infrequently posed. One must ask whether or not measures 
and models purporting to assess learning can actually do so. If they 
cannot, we are wasting an enormous amount of time and money 
using tests for this objective. If they can, but only under certain 
assumptions, the implications of these assumptions should be the 
focus of attention. Begging these questions, or reformulating the 
Problem out of existence, will not lead to a greater understanding of 
Cognitive growth. The dilemmas raised by the measurement and 
analysis of change scores are a case in point. Analysts are advised not 
to study learning or change directly, but to utilize correlational 


techniques for uncovering relationships of interest (Cronbach and 
F urby, 1970). While such a recommendatio 
it amounts to suggesting that the causes of c 


Study, while the process is not. | 
In order to infer learning based on achievement tests, one must 
ed at different positions 


n is sound statistically, 
hange are amenable to 
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and intervals is basic to the study of change. The assumptions im- 
posed by accepting a particular distribution are not neutral with 
respect to questions of change; they determine in a fundamental 
way the conclusions reached. 

The argument that I will Support at some length is, first, that 
there are both logical and empirical reasons for believing that the 
underlying skills and aptitudes measured by test scores are not 
normally distributed but positively skewed; and, second, that the 
errors of measurement involved in testing are not random with 
respect to true scores. In the context of longitudinal data, neither 
assumption seems tenable. Furthermore, these assumptions are con- 
sequential; they substantially affect research results and conclusions. 
Although correlational analyses have many virtues, including con- 
sistent results irrespective of the metric assumed, the presence of 


find greater empirical support. The ultimate goal, which exceeds the 
scope of this chapter by a wide margin, is to arrive at a basis for 
verifying and validating measurement and measurement models for 


MEASUREMENT MODELS AND CLASSICAL 
TEST THEORY 


: trast, one cannot assume that the object 
of research is unaffected by the measurement. Repeated measure- 


tent, measures are inevitably fallible. 

Classical test theory derives from ane 
ment and from a set of mathematical 
of measurement that links Observati 
measured. The basic relationship posite 


Xplicit rationale for measure- 
propositions regarding errors 
ons to the construct being 
d, which is generally regarded 
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as axiomatic, is that an observed test score consists of two inde- 
pendent additive components, a true score and an error. A critical 
test for any theory, whether expressed mathematically or verbally, 
is not only how well phenomena are explained but how well the 
theory accounts for what cannot be explained. Classical test theory, 
much like the theoretical formulations of less rigorous areas of in- 
quiry, assumes that extraneous factors or errors are random with 
respect to the observations or processes studied. 

The most parsimonious version of test theory can be reduced to 
four basic propositions regarding errors of measurement: 


X-T*e (1.1) 
E(e) = 0; (1.2) 
pTe = 0; (1.8) 

pe, es = 0, (1.4) 


where X is the observed score, Т is the true score, е is an error 
term, and e, and ез are independent observations. This formulation 
does not depend on assumptions about the distribution of unob- 
Served values such as the true score or the error. For this reason, it 
is argued that distributional assumptions are not integral to the 
theory (Lord and Novick, 1968). It is assumed that two parallel 
measurements of a true score are taken and that each contains error. 
Errors of measurement are of two kinds—systematic and random. If 
there are systematic errors, the two measurements are by definition 
not parallel; if the tests yield parallel measurements, it is assumed 
that only random errors influence the observed scores. If errors are 
random, their expected value is equal to zero and they are assumed 
to be independent of true scores. Two parallel tests must, therefore, 
have equal means and variances; errors of measurement must by 
definition, have equal variances on parallel tests and must be un- 
Correlated with each other. 


Classical test theory can be derived from these four specifications 


regarding errors, provided that one accepts the notion of parallel 
measurement. As Lord and Novick (1968) demonstrate, it is unneces- 
sary to make any assumption about the distributions of either true 
Scores or errors if one can replicate measurement. If two successive 
measurements have the same true Scores and identically distributed 
errors of measurement, then they are equivalent. Under the assump- 
tions of classical test theory, repeated measurements are parallel 
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if their first and second order moments are equivalent. Hence, by 
definition, parallel measurements will have equal means and var- 
іапсезѕ. 

Parallel measurements, however, cannot be used for studying 
growth for either individuals or a sample. If the means or the var- 
iances of two test forms are shown to differ, as they must if one 
intends to infer change for all or part of a sample, the measurements 
are not equivalent. Without equivalent measures, interval assump- 
tions and scaling are crucial. In order to compare results for in- 
dividuals or for the total sample, one must make additional assump- 
tions about test score distributions. Although it is possible to deduce 
a considerable portion of classical test theory assuming only ordinal 
measurement and without stipulating a specific distribution for test 
scores, interval measurement is essential for studying change. 

Distributional assumptions are not troubling if the major interest 
is the measurement of a static attribute or propensity. However, they 
are enormously important for an assessment of growth or change. If 
two measurements differ with respect to either their means or their 
variances, one cannot assume replicate or equivalent measurement. 


equal intervals, one cannot compare changes in test scores at dif- 
ferent points on a Scale. 


When pressed, the stance taken by most PSychometricians toward 


ent. Since means, variances, and intervals are determined by the 


occurred in the underlyin 
Stability of test scores by 
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ordering of students at particular junctures, but not the amount of 
change or the degree to which change is uniform at different points 
along a scale. In short, if one is unwilling to make distributional 
assumptions, one cannot study change or learning directly. 

The importance of metric assumptions for the study of change 
has not gone unnoticed in the literature; however, it does not occupy 
the position of importance it merits. As Bereiter (1963:10) obs- 
Serves, to “confront the issue fully is to see psychometric theory 
totter.” Both the validity and the reliability of individual achieve- 
ment tests have been the subject of debate, deliberation, and much 
research. The point to be made in this context is that the most 
reliable tests in existence will not yield valid measures of growth 
unless one posits assumptions about metrics. Measures of learning 
depend fundamentally on changes observed in the means and var- 
lances and only secondarily on higher order moments. 

As an illustration, suppose that two students are given a particu- 
lar test and their scores are found to differ. The question of whether 
the achievement of one student exceeds that of the other is entirely 
determined by the reliability and validity of the test. If one knows 
ог can estimate its reliability, the probability that the higher-scoring 
student is more knowledgeable in the skill tested can be determined 
very accurately. Assume that a second test, an equally valid measure 
of the skill, is administered at a later time; assume further that one 
Student is observed to have gained more than the other. One can 
infer which student performed better the second time with some 
certainty; however, without assuming a specific interval scale, it is 
not possible to determine which student learned more. Since the two 
students differed initially, their gains will not encompass comparable 
Portions of the test scale. It is always possible to devise an admissible 
Monotonic transformation that inflates the distance traversed by one 
Student differentially, while not altering their relative positions on 
either test. A transformation sufficient to reverse the intuitive con- 
clusion regarding which student gained more would be nonlinear in 
form, and it might be difficult to justify. However, a transformation 
of scale capable of such a substantive reversal would not violate the 
Measurement properties of standardized tests in any way. | 

In sum, distributional assumptions establish the metrics essential 
wth; without such assump- 
lions, inferences are not possible. Conceived of necessity, metrics 
assumptions are often regarded as innocent inventions, fathered by 


theory. Without systematic pro- 
s, test theory loses a 
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great deal of practical utility; accepting the need to scrutinize and 
validate distributional assumptions and patterns of cognitive change 
adds a new dimension to the concerns of psychometric theory and 
raises the issue of the adequacy of classical test theory. 


DISTRIBUTIONS AND METRICS 


One common technique for attributing interval measurement to 
standardized achievement tests is to normalize raw scores and scale 
the resulting values in terms of the interval values implied by a nor- 
mal distribution (Jones, 1971; Angoff, 1971). These procedures are 
justified in terms of the central limit theorem and by the fact that 
Observed test scores typical approximate a normal distribution 
reasonably closely when based on a fairly large sample size. The as- 
sumption that the underlying distribution of ability or achievement 
is normal has nearly attained the Stature of a scientific truth in the 
psychometric literature. Since Galton published Hereditary Genius in 
1869, which claimed to show that all physical traits and natural 


tributed has not been seriously challenged. Moreover, normal dis- 
tributions provide Statistical advantages that are not found in other 


relationship to student growth. 


The logic of developing such scales is similar to that proposed by 
Abelson and Tukey (1959), who argue that scales should be based 
on the additivity of effects. One can imagine calibrating scores to 
reflect the progressive learning of some finite universe of skills and 
knowledge or, alternatively, linking scales to temporal norms. 
Criterion-referenced tests and mastery learning are based on similar 
objectives (Block, 1971; Bloom, 1976). Recent work with latent 
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trait models devised to yield “person-free test calibration" based on 
the item-characteristic curves also shows promise (Goulet, Linn, and 
Tatsouka, 1975). 

I suspect that measures of achievement that embody empirically 
derived metrics for measuring cognitive growth will have substantial- 
ly more skewed distributions than conventional standardized scores. 
This assertion is based on two observations drawn from longitudinal 
test score data. If one assumes that individual test items can be 
Scaled according to the median age or grade level of children capable 
Of successfully answering the item and that a single learning curve 
characterizes all children, the resulting values would closely resemble 
grade equivalent scores. Scores based on grade equivalent norms have 
distributions with substantially different interval properties than do 
the raw or standard scores: this distribution is invariably more 
skewed. Although the correlations between raw scores and grade 
equivalents tend to be close to unity, they portray very different pat- 
terns of learning. Increments of gain in raw scores points are weighted 
progressively more heavily as one ascends the measurement scale; 
each successive correct answer added to a child's score translates into 
à progressively greater amount of achievement in temporal units. 
Thus one finds, as in Tables 1-1 and 1-2, that larger grade equivalent 
gains accrue to children farther up the scale despite smaller relative 
increments of raw score or standard score points. | А 

Grade equivalent scores are at best а primitive approximation to 
à temporal metric; they assume a single learning curve based on the 
lest scores of the child, and they typically contain an unknown 
àmount of interpolating and “smoothing” of values. The balance of 
Opinion discourages their use for educational research. Many of the 
reasons given are technical rather than substantive: grade equivalent 
Scores are not normally distributed, error terms are quite hetero- 
Scedastic, and the measures are less “well behaved” in statistical 
analysis. Nonetheless, grade equivalent scores are the only commonly 
available metric that embodies units of growth based on a scale inde- 
Pendent of the distribution of children.* Further refinements of ex- 
isting scales are doubtless possible and desirable; however, the 
distribution of scores scaled relative to time would be more skewed 


than that observed for raw scores. 


The rates of learning by metric depicted in Table 1-2 are instruc- 


tive. If one assumes that an appropriate metric for scaling achieve- 
ment items is a child’s grade level, one finds that difference scores 
are predictably related to socioeconomic differentials. By changing 
the distributional assumptions underlying the metric and imposing 
ап age-graded scale, one achieves intervals that, I would argue, are 
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substantially more realistic sociologically. If achievement is assumed 
to be normally distributed, one observes "regression toward the 
mean.” If intervals are based on temporal scales, gain scores are 
sufficiently robust to yield measures of learning. A similar pattern 
pertains when one compares gain scores by IQ level or by other 
measures of parental status. I would argue that many of the findings 
in the literature attributed to “regression effects” are actually 
artifacts of the intervals or distributions assumed. Studies have 
consistently documented very low and not infrequently negative 
correlations between ability levels and the rate of learning among 
children (Anderson, 1939; Bloom, 1964; Fleishman, 1965; Zeaman 
and House, 1967). If one were to accept this finding at fact value, it 
would belie the experience and observation of anyone who has 
taught children. If one assumes that measurement error is solely 
responsible, it is difficult to justify using the tests to study learning. 
The rather modest transformation implied by grade equivalent 
scores yields more credible metrics, without any assumptions about 
measurement error. I suspect that a good many of the null findings 
regarding change that are interpreted as regression effects should be 
traced to inappropriate interval assumptions. With appropriate met- 
rics, one would discover that both low and high ability students 
progressed. The observed “regression toward the mean" is the 
result of learning among low ability children and a systematic devalu- 
ation of gains to high ability children.^ 

There are also logical reasons why one might expect that the con- 
ventional intervals do not reflect the true distribution of achieve- 
ment. Ordinarily, raw scores are summed, without weighting; a cor- 
rect response is assumed to reflect a similar amount of achievement, 
however difficult or easy any particular item is. The forty-second 
correct answer adds as much to one's total score as the thirteenth. 


The items included on achievement tests typically vary considerably 
in their level of difficulty. It seems 


sequently gains, 
raw scores. 
Finally, there are theoret. 


ical reasons for expecting that the dis- 
tribution of “true” achiev 


ement is skewed. There is a necessary 
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connection between the form of a distribution and the processes 
that generate observed outcomes. Numerous experimental studies 
of learning imply that the rate of skill acquisition is not linear with 
respect to time. Although learning theorists have concentrated on 
narrowly defined skills that are probably not directly comparable 
with the skills tapped by achievement tests, consistent findings about 
the process have been reported. Stevens and Savin (1962) argue 
that skill acquisition in a wide range of areas is a power function of 
the time elapsed or the amount of practice. The actual rates of 
growth differ, depending on whether one studies nonsense syllables, 
for example, or psychomotor abilities; yet these studies find that 
individual growth is proportional to the prior level of skill. If the 
behavioral processes governing learning are generalizable, perhaps 
because of behavioral reinforcement, one would expect a skewed 
Skilled distribution that would become more skewed over time 
(Hamblin, et al., 1971). 

In sum, there are sociological, logical, and theoretical reasons for 
assuming that achievement scores should be skewed, not normal. If 
One makes modest alterations in the shape of the distribution, such 
as those suggested by grade equivalent scores, the relative difficulty 
of items, or a power function, the resulting intervals and metrics 
describe quite different patterns of learning. While it is not now 
Possible to infer the shape of the underlying true score distribu- 
tion, it seems reasonable to conclude that it would not be bell- 
Shaped. It is possible to calibrate metrics temporally and to validate 
intervals or substantively, although this has not been done. Such 
и would surely yield a metric more appropriate for studying 
change, 

Assuming that the underlying distribution is skewed rather than 
normal produces patterns of learning or gain that differ markedly 
from raw scores. High ability students, as well as students from 
relatively advantaged backgrounds, achieve consistently more during 
Specific time intervals than is the case when one assumes à normally 
distributed variable. Furthermore, ability and social class are shown 
to interact with the effects observed for educational programs 
(Heyns, 1978), While it is intuitively plausible that educational 
Programs are not equally effective for all students, statistical controls 
for individual differences in background or ability tend to obscure 

€ patterning of gains and to mask interactions. Unless one is able 
to randomize groups experimentally, and thereby equate initial 
Scores and other correlates of achievement, mean differences cannot 

* compared without knowledge or assumptions about intervals. I 
Would contend that a large number of nonexperimental studies of 
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education have erred by introducing statistical controls for individual 
differences that overwhelmed group effects, under the assumption 
that an appropriate model was additive rather than interactive. 
Without valid intervals, interaction effects are meaningless (Wilson, 
1971). 

When confronted with ordinal measurement, sociologists have 
typically relied on the reassuring observation that correlations are 
remarkably stable when subject to monotonic transformations 
(Labovitz, 1970; Vargo, 1971). Yet many educational problems 
logically imply an interactive model rather than an additive one. 
Sociologists have traditionally conceived of social class, for example, 
as a causal agent, interacting with other environmental influences. 
If one assumes that social class interacts with other attributes and 
experiences of individuals, additive models would not yield credible 
results. Indeed, as the next section details, the conventional assump- 
tions about random measurement error are not warranted when 
assessing achievement in a longitudinal frame. If errors of measure- 
ment are not independent of true Scores, interactive effects will 
tend to be underestimated and inconsistent. The sanguine presump- 
tion that correlational techniques yield valid estimates of the effects 


of other variables would be justified only if one could assume that 
errors are uncorrelated. 


CORRELATED ERRORS OF MEASUREMENT 


Longitudinal test score data reveal several other troubling patterns. 
For example, the correlations between successive forms are observed 
to increase during successive time intervals (Heyns, 1978). Not only 
does the effect of the pretest increase, but so does the correlation 
between achievement and every correlate of achievement when based 
on nominally equivalent forms. If one were to assume that errors of 
measurement were uncorrelated with true scores, one would have to 
conclude either that the true relationship between achievement and 
every presumed cause of achievement increased with time or that 
the error variance declined relative to the total observed variance. 

Taken alone, the increasing correlations would not disturb many 
analysts. There are several po 


First, it seems plausible to assume that the reliability of the tests 


a function of sample heterogeneity (Lord and Novick, 1968). Second, 
parallel though not identical forms; 
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perhaps a sequence of similar tests provides an opportunity for 
Students to practice and consequently decreases the likelihood 
of random errors. Perhaps repeated testing serves to standardize the 
Procedures for administering and scoring tests accurately, with the 
same result. Finally, it is possible that the relationship between 
test scores and the determinants of achievement becomes more 
predictable over time. Tests tend to become more stable and to 
exhibit greater reliability as children mature. Perhaps, as Bloom 
(1964) and others have argued, ability becomes more fixed with age. 
One would expect to find increasing correlations if the amount of 
learning declined or if the fluctuation in achievement level became 
less extreme. 

Sociologists have endeavored to separate analytically the effects 
Of test score stability and unreliability. Models designed to dis- 
entangle random measurement error from exogenous disturbances 
using panel data have been proposed and estimated. Following the 
lead of Coleman (1968), Heise (1969) presents an explicit causal 
model for separating the effects of temporal instability from measure- 
ment error using three waves of observations. His model, which as- 
Sumes a constant test-retest reliability, is explicated in terms of 
Standardized coefficients. Wiley and Wiley (1970) present an al- 
ternative model, which assumes constant error variance over time and 
thus permits observed test score reliability to increase as a function 
of sample heterogeneity. Werts, Joreskog, and Linn (1971) present 
data from four waves of achievement scores and attempt to test the 
Plausibility of the two alternative models. Their conclusion is some- 
what equivocal; the assumption of a constant reliability yields su- 
Perior statistical fit but unreasonable parameter estimates, while 
the assumption of equal error variances i Г 
mates that аге more consistent with theoretical expectations. 

Each of these models depends on the critical assumption that 
errors of measurement are uncorrelated with true scores and are 
Serially uncorrelated. Wiley and Wiley (1974) extend these models 
to include serially correlated errors and demonstrate that if one 
assumes equal error variances and constant regression coefficients for 
both true scores and errors of measurement during fixed time inter- 
Vals of equal length, it is possible to estimate error directly. | 

In general, each model suggests that the stability of true Scores is 
Substantial and likely to be underestimated by conventional correc- 
tions for attenuation. If one assumes serially correlated errors having 

© structure suggested by Wiley and Wiley (1974), estimates of 
reliability are consistently lower than they are under the assumption 
9f either constant error variance (Wiley and Wiley, 1970) or constant 
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reliability (Heise, 1969). Wiley and Wiley (1974) note, however, that 
an adequate model of correlated errors should link the components 
of measurement error directly to the conditions of measurement. 

Psychometricians have extensively examined errors of measure- 
ment on tests. Analysis suggests that empirically derived errors 
of measurement for test scores are not independent of true scores, 
nor do they appear to be normally distributed. In an interesting 
empirical paper, Lord (1960) concludes that errors of measurement 
tend to be a function of true score and to be significantly skewed. 
In the high ability group, the error variance decreased as true scores 
increased and errors were negatively skewed; in the low ability group, 
the error variance increased with true scores and the distribution of 
errors was positively skewed. Lord began by proposing to test the 
assumption of uncorrelated, normally distributed errors adopted by 
classical test theory; although he found substantial reason to view 
these assumptions as unwarranted, he does not conclude by question- 
ing the theory. Later, Lord and Novick assert that these empirical 
results "seem reasonable, or can be plausibly rationalized, as a 
consequence of a floor effect and a ceiling effect" (1968:233), 
rather than as a challenge to the theory. 

It is possible to derive a model of measurement error for test 
scores that fits empirical data far more plausibly than the classical 
model does. Psychometricians have long known that multiple-choice 
exams invite guessing by subjects and that this is a major source of 
error. A considerable literature exists on formula scoring procedures 
designed to correct for guessing by subjects. The random guessing 
model is one of the most elementary. It is assumed that the subject’s 
total score, X, consists of K correct responses to items that are 
known and G correct guesses. If there are n items on an examination, 
each with A possible answers, and the student guessed randomly on 
every item, omitting none, the expected score would be equal to 


Ee. (1.5) 
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Despite such drawbacks, a model of random guessing does provide 
a means of conceptualizing and quantifying measurement error. Such 
a model does not, however, justify the assumption that errors are 
uncorrelated with true score. 

One might posit a model of observed scores such that 


X; = К+ Git ei, (1.6) 


w ; Р 
а X, is the observed score, К; the number of items known, 
i the correct guesses, and e; a random error term. If one assumes 


random guessing, 

E(G) = p (n — K), (1.7) 
Where p = 1/A. If G isa binomially distributed variable, and а = 1 — P, 
it can be shown that 


Var (G/K) = pq (n — K) and (1.8) 


Var (G) = pq [n — E(K)] + p? Var (К). (1.9) 


ies that G is a function of (n — К); 


the larger the number of items known, the smaller the contribution 
S random guessing to the observed score is likely to be. If the 
nowledge level of a group increased, one would expect to find a 
Corresponding decrease in guessing. The covariance between K and 
would be negative. 
In order to generate estimates of t 


Variances, the model was simulated by computer. 
ariance set equal to those 


os was generated with a mean and v 
Observed in the longitudinal data analyzed. The number of test 
items, n, was set at 55, and p was assumed to be .2; these are the 
Values of n and p for the Metropolitan Achievement Test, word 
knowledge, 

Four sequential simulations were run, each with a case base of 
500 students. A learning curve that approximated the observed 
Scores was adopted in which 


The random guessing model impl 


he expected variances and co- 
A normal deviate, 


K= о е а t е. (1.10) 
of К with increasing means and 
A random error term with a mean 
al to .3 was added to Ky, Кз, 
In this model, Кү is known, 


в equation produced values 
of lances as a function of time. 
an ye and a standard deviation equ 

K, after the initial simulation. 
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While guessing is not. Random guessing is assumed to be a stochastic 
process generating a binomially distributed random variable with a 
mean equal to .2 (55 — K). The observed scores, X;,, were computed 
as the sum of Kir, Gi, and ер. Observed scores that exceeded 55 
were arbitrarily set equal to 55; there were only a handful of such 
cases, 

The matrix of correlations generated by the simulation is pre- 
Sented in Table 1-3, with the observed means and standard devia- 
tions. As expected, the correlations between successive scores on 
knowledge, K,, are close to unity. The variance of knowledge ex- 
ceeds the observed score variance at each specified time, as it must if 
the random guessing model holds. The covariances between K, and 
G, are substantial and increasingly negative as the mean value of K 
increases; these correlations range from —.547 to —.796. As the ex- 
Pected score of K, increases, the expected score of G, declines; 
Since the variance of G, is a function of the variance of Кү, which 
15 increasing, this value also tends to increase. The adjacent cor- 
relations between K, and the observed score, X;, increase regu- 
larly, from .869 to .929; the negative relationship between guessing 
and the observed score also increases. Despite the fact that knowledge 
is almost perfectly predicted from prior knowledge, the observed 
Score correlations suggest far less temporal stability. The model 
unrealistically assumes K; to be substantially determined by Ки—1) 
hence observed scores are more highly correlated than one would 
expect empirically. . 

The results from the simulation suggest a perfectly plausible ex- 
Planation for the increasing correlations between pretest and post- 
test observed in longitudinal data. Observed scores are attenuated 
Not only by random errors but also by guessing. The dynamics of 
Cognitive growth increase the variance of knowledge among subjects, 
while a test of fixed length has only a limited number of opportuni- 
Чез for guessing. Since guessing is inversely related to knowledge, 
the expected covariances between guessing and both knowledge and 

e observed scores are negative. The total variance in observed 
Scores is a function of both guessing and knowledge; the model im- 
Plies that the variance explained in a posttest by a pretest will in- 
Crease as a function of the expected values. This result, as we have 
Seen, is the empirical pattern observed on parallel forms over шщце 

Conventional reliability theory assumes that it is eren о 

compose the observed variance into а component due ch 4 
score and a component due to random error that are indepen on 
Errors of measurement under such a model can only deflate the 
observed relationships. The simulation of guessing raises certain 


36 Issues in Microanalysis 


questions about reliability estimates that are not easily resolved. 
First, if the level of knowledge constrains errors due to guessing, one 
would expect to find that estimates of test score reliability increase 
with knowledge even though the instrument is unaffected. More- 
over, one would observe high levels of reliability for groups with 
larger mean scores. The measured determinants of achievement 
would increase in magnitude at successive times because of the 
increasing relationship between knowledge and observed scores. 
The guessing model implies that the observed correlation between 
how much a student knows and any presumed cause of this knowl- 
edge might increase over time, even though the actual relation was 
constant. If one assumes that measurement error is uncorrelated to 
true score, one might fallaciously conclude that the relationship 
between achievement level and either prior achievement or any other 
cause of achievement increases with time. 

The four waves of test score data illustrate this possibility. Table 
1-4 gives the correlations, means, and standard deviations observed 
for the sample; Table 1-5 gives the regressions of posttest on pretest, 
parental education, family income, race, and IQ for these data for 
three successive time periods. Perusing Table 1-4, one notices im- 
mediately that the zero-order correlations between test scores and 
the background variables rise over time. The regressions in Table 
1-5 indicate that the explanatory power of all variables included 
(R?) increases over time; the coefficients tend to do so as well, al- 
though the patterns are complex. As Tables 1-4 and 1-5 show 
clearly, the zero-order correlations between achievement and every 


other variable increase with time, as does the total explained vari- 
ance. 


If one is willin 
and operates in + 
sible to deduce sam 


given in the appendix to this chapter. 
tion of K,, ructural equations for the determina- 


Ey o У (1.11) 


Ko = 03104 + 05, (1.12) 
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Figure 1-1. Hypo 
Scores and Random Guessing as a Function of Knowledge. 


Кз = a32 [92101 + 05] + 03, (1.13) 


K, = адз [932 021 91 + 232 05 + 03] + 04, (1.14) 


and four sets of two equations linking knowledge and guessing to 


Observed scores, 


X, = В.К, + Вазе С (1.15) 


G, = y. Ke (1.16) 
The variance of observed scores, given the assumptions, is equal to 
Var (X) = Var (K) + Var (G) + 2 Cov (K,G). (1.17) 
conditional variance of guessing, Gi, 
can be expressed in terms of the variance in knowledge and the 
probability of a correct response, P, given random guessing (see 
equation [1.8]). The variance of guessing is given by 


The expected value for the 


Var (G) = ра [n — Ё(КЛ1 4 p? Var (K), (1.18) 
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and the covariance of guessing and knowledge is equal to 
Cov (G,K) = —p Var (K) (1.19). 


The proof of these last two formulas is given in the appendix to this 
chapter. For any specific distribution of observed scores, the variance 
in X is assumed to be completely determined by the variability in K 
and G, which can be estimated from the equations provided above. 
Knowledge is assumed to be a linear function of prior knowledge and 
a disturbance, 0;; the coefficient, Qij, is set equal to the correlation 
between K; and K;. Given the model, this would be the value neces- 
sary to reproduce the observed correlation between x; and x He 

The coefficients linking knowledge to guessing, y,, and to the ob- 
served score, 6,,, are estimated assuming no measurement error. The 
understandardized coefficients are therefore straightforward: y, = 
—.2, assuming p = .2 and 81; and fs, are equal to unity. The correla- 
tions between K; and X;, however, are less than 1. 

Table 1-6 presents the expected means, variances, and covariances 
based on raw scores in the longitudinal data. Figure 1-2 summarizes 
the model of learning implied if knowledge operated as a simple 
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-2. Cognitive growth for longitudinal data at four given times. Ob- 
served score, X, assumed a function of knowledge, K, and guessing, G 


Figure 1 
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causal chain and the coefficients took on the values specified. The 
model is overidentified, since it was not necessary to use three of the 
estimated correlations; in order to reproduce the matrix of estimated 
correlations, the disturbance terms, 0,, are permitted to be corre- 
lated. These values suggest that a simple causal chain is not adequate 
to describe the growth of knowledge at successive times. 

One objective of the model was to account for the increasing cor- 
relations between pretest and posttest over time. Although the cor- 
relations between K;_, and К, are much closer in magnitude than are 
the correlations between observed Scores, the estimated effects still 
increase during successive intervals. The model of random guessing 
tends to reduce the disparity between effects on Successive measure- 
ments but not to eliminate it. 

The model implies that any cause or correlate of knowledge that 
is related to X would be attenuated by errors of measurement in- 
troduced by random guessing. Furthermore, it implies that these 
observed correlations would be likely to increase over time, irrespec- 
tive of the true effect. Using the techniques of path analysis, it is 
possible to generate estimates of the relationship between knowledge 


and any variable presumed to cause it. This correlation would be equal 
to: 


» Fox 
Fox, = =— (1.20) 


Гкех, 


of knowledge over time. 


The original intent of the guessing model was to account for the 
observed increases in the со: 
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implied by the model are not sufficient to account for the increments 
in true score variance observed. 

I suspect that one reason the model fails to account for the 
increasing correlations in their entirety is not that the notion of 
correlated error due to guessing is wrong but that the model under- 
estimates the extent of the problem. The corrections deduced from 
the model are not large, since only errors due to random guessing 
were included; these corrections do not involve substantial modifica- 
tions of the original correlations. The model imposes a crude dich- 
otomy on the process of test taking; it is assumed that every student 
either knows or does not know the correct response. If one assumed 
instead that an even larger portion of the variance attributed to 
knowledge was due to informed guessing, based on partial knowledge, 
the model might account to a much greater extent for the increasing 
explanatory power of other variables. The logic of this argument 
assumes that, unlike the model elaborated, the behavioral phe- 
nomenon of guessing is not an entirely random process; knowledge 
increases over time, but then so does partial knowledge. The student 
who can successfully eliminate a larger proportion of wrong answers 
at time t than at time 1-1 would be likely to increase his or her 
Score disproportionately during the interval. 

Further refinements of this model of test taking that included 
errors of measurement due to guessing are possible; however, they 
would involve either more substantive information or more sophisti- 
cated assumptions about the process. The model of random guessing 
assumes that the level of knowledge is the only determinant of the 
amount of guessing; correct guesses are assumed to be a function of 
chance and of the number of items for which the answer is not 
known. This assumption is probably too simplistic. If guessing re- 
flects partial knowledge as well as random choices, one might expect 
high ability students to guess better than low ability ones and stu- 
dents with more knowledge to guess better than students with less. 
Such assumptions would further complicate estimation procedures. 

In sum, this section has argued that the assumption of errors of 
measurement uncorrelated with true scores cannot be supported 
empirically. Although such an assumption is convenient statistically, 
it does not accord with either the data available or an intuitive under- 
standing of how children take tests. Errors of measurement can be 
plausibly assumed to result from guessing; models that include 
guessing tend to yield both a more realistic conception of the process 
and a better fit to empirical data. Although such models pose meth- 


odological difficulties not encountered by classical test theory, they 
do not appear to be intractable statistically. 
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Explicating all the ramifications implied by the model of guessing 
exceeds the scope of this chapter. In passing, however, a few of the 
more salient issues should be mentioned. First, if one assumes that 
the relevant attribute to be measured is individual knowledge and 
that it is related to the observed score in the manner described, much 
of the literature on test score reliability and corrections for attenua- 
tion should be modified. Correlated errors of measurement imply 
that one cannot decompose observed scores into a component due 
to true score and an independent error. Moreover, the variance of 
knowledge must be larger than the variance of observed scores. 

Second, if correct responses due to guessing are inversely related 
to the number of items known, one would expect heteroscedastic 
errors, Students higher on the scale have both more knowledge and 
fewer occasions to guess; knowledge would contribute an increas- 
ing share of the variance of observed scores as one moves up the 
Scale, while the proportion of the variance due to guessing would 
decline. I suspect that an empirical puzzle posed by studies of 
reliability could be explained by such correlated errors. Students 
Who have higher scores tend to exhibit more predictable outcomes 
than do low-scoring students. Thus one finds that test scores are 
More “reliable” for whites than for blacks, despite the fact that 
measures scale equally well for both groups and that observed test 
Scores are often equally variable among both groups. If the level of 
knowledge is consistently related not only to observed scores but 
also to errors of measurement due to guessing, this is the pattern one 
might expect. 3 A 

Third, correlated error implies that test forms of varying difficulty 
cannot be equally valid or reliable measures of true knowledge. каје 
Cedures for vertically equating tests typically assume that a simple 
linear transformation of items should suffice. This would not be true 
unless errors were uncorrelated; I suspect that this is one reason such 
Procedures have not proved very satisfactory (Goulet, Linn, and 


Tatsouka, 1975) 

Finally, the presence of correlated pene a error yid 
rand · = implies that the level of knowledge 15 not : 
ш the observed increase 1n Cor- 


ЈЕ 
Pendent of the causes of knowledge а ийе 


Sized, one would not expect stab 


locati i se, so do 
lon. As aggregate means increase, à à 
Variance due E ая knowledge. Thus, the issues of metrics and 


distributions return through the back door. apod i m 
tionally been satisfied, at least partially, by the ane 
Stability of measured effects and have argued that огаш: 
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ment presented no great problem for this reason. However, this 
would be true only if errors of measurement were uncorrelated with 
true scores. 


CONCLUSIONS 


Cognitive growth, it has been argued, must be assessed in the context 
of longitudinal data. While it is clear that there is no substitute for 
accurate and reliable longitudinal data, achievement scores over time 
raise as many questions as they answer. 

Viewed in a longitudinal framework, test scores do not lend much 
support to the basic assumptions of classical test theory. I have 
argued that it is impossible to measure growth or change without 
imposing specific interval assumptions or assumptions about the dis- 
tribution of true scores; the assumption that the appropriate distribu- 
tion is normal or that the intervals observed are indentical on true 
and observed scores seems very weak. If one assumes that metrics 
for learning must meet minimal criteria of logical and substantive 
validity, it is clear that a skewed distribution with unequal intervals 
provides a more credible picture of growth. Furthermore, the four 
basic assumptions of test theory regarding errors of measurement do 
not seem tenable. If guessing contributes to measurement error in 
the manner specified, one would not expect errors to be uncorre- 
lated to true scores or to each other; if effors due to random guessing 
were a function of “true” knowledge, as posited, their expected 
values would not be zero, and their standard deviations could not be 
equal. The model of guessing is deficient in several ways; yet, I would 
argue, it provides a more plausible explanation of the patterns found 
in longitudinal data than alternative theories do. 

Among educational researchers, there is a tendency to apologize 
for one's data but never for one's model. If the evidence presented 
here is convincing, it suggests that basic revisions in test theory are in 
order. The single most important item on the educational agenda is 
constructing and validating tests that can measure learning reliably; 
there are substantial reasons to doubt that those available can do so. 
Without such measures, educational research is problematic and ed- 
ucational evaluation a charade. The assumption necessary for study- 
ing cognitive growth are enormously consequential, as I have tried to 
show, and fundamentally affect the results obtained. They must 
be examined both theoretically and empirically. If this study in- 


Spires concern about these issues, at least one objective will have 
been met. 
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APPENDIX 
The model for guessing is given by 

X; = Ki + Gi, 
where X; is the observed score, 


К; is the student's true knowledge, 


G; is conditionally binomially distributed given K;, 
with E(G,iK;) = p(n — Ki), 
and Var (GIK) = pq(n — K;). 
Var(G) = E(G*) — [E(G)]*, 
Е[Е(С?ІК)] – [E(G)]*, 
= E[Var(GiK) + E(GIK)*] — [E(@)]’, 


= E[pq(n — K) + р(п — К)?] – [p(n — КОЈУ, 
= pq(n — К) + р?Е(п — К)? — [p(n – К)]°*, 
= pa(n — K) + p? [n? — 2nK + Е(К?)] – [p(n - КОЛ, 
= pq(n— К) + p? (n? — 2nK + [Var (K) + КО] —p?(n—K)’, 
= pq(n — K) + p? [(n — K)? + Var(K)] — p*(n — КУ, 
= p* Var(K) + рап — К). 
The solution for the covariance between knowledge, K, and guessing 
G, is given by: 
Cov(G,K) = E(GK) — (EG) (ЕК), 
= E (K[E(GIK)]} — (EG) (ЕК), 
= p[nK — E(K?)] – р(п - ЮК, 
= p[nK — К — Var(K)] - ріп – K)K, 
=p(n — K)K — pVar(K) – р(п — K)K, 
= —pVar(K). 
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NOTES 


1. For more details regarding the tests and sample, see Chapter 2 and Ap- 
pendix A of Heyns (1978). The substantive problem for which these data were 
gathered involved estimating the effects of schooling by contrasting the level and 
the determinants of achievement during the school year and the summer. Ex- 
posure to education was viewed as the "treatment"; cognitive growth in the ab- 
sence of schooling was taken to reflect the effects of family and peers, while the 
pattern of learning during the school year was assumed to be the result of 
family, peers, and schooling. 

2. Grade equivalent and standard scores for the norming population repre- 
sent the published values corresponding to the means for a given raw score; they 
are not the mean scores calculated for individuals in the population. The grade 
equivalents are, therefore, arbitrary scale points yielding, by definition, equivalent 
gains for fixed intervals of time during either school year. 

3. Psychometricians have also warned against the indiscriminate use of grade 
equivalent scores to compare children across several grade levels. It is certainly 
true that a fifth grade child who tests at the tenth grade level is not equivalent 
in any meaningful way to an average tenth grader; however, the objection that 
a posttest is not an equally valid indicator of the skills measured on a pretest 
applies whatever the time interval or the age of the child. My concern is to show 
the patterning of achievement between nominally parallel forms taken over 
intervals of a year or less. I would not recommend the extrapolation of scores 
beyond these limits. Moreover, the construction of grade equivalent scores in- 
volves assumptions and procedures that can be criticized on several grounds; 
as with any measure adopted for research purposes, the analyst must scrutinize 
the resulting measure in light of the objectives. 

4.Simple gain scores are generally considered suspect by experienced 
analysts, although one can still find articles recommending their use (Richards, 
1975). Regression effects are, under conventional assumptions, ubiquitous. One 
commonly finds that the correlation between a pretest, Xi and the gain, X2 — 
X4, is negative. It is easy to show that this correlation must be negative when- 
rare 2 Ese tius regression coefficient, боха , is less than unity. While 

| not be unreasonable to argue that a regression slope greater than 1 i5 
a defensible criterion for learning, unstandardized coefficients tend to be smaller 
when estimated by conventional techniques. Alternative estimation procedures, 


hs weighted least squares, seen preferable for such variables (Heyns, 
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Production Technologies and 
Resource Allocations within 
Classrooms and Schools: 
Theory and Measurement* 


Byron W. Brown and Daniel H. Saks 
Michigan State University 


There are many ways to look at schools, but the model 

that has received the most attention recently is that of 

the school as a utility-maximizing firm using labor (teach- 
ers) and capital (equipment and buildings) to process (educate) raw 
materials (students). Unfortunately, in most previous work such a 
model has been more a useful analogy than a powerful analytical 
‘evice. And it has been a positively misleading analogy when in- 
Significant regression coefficients for purchased school inputs in an 
input-output” equation were taken to be indicators of zero mar- 
8nal products for those inputs (cf. Averch et al., 1972). 

In this Chapter we hope to clarify just how the simple economic 
Production model needs to be modified to handle the essential ele- 
Ments of educating students in schools. It is our belief that such an 
Pitar allows a deeper analysis of allocative choice and that this will 

elp in evaluating alternative technologies, testing management rules 
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of thumb, and generally improving the productivity and operations 
of schools. 

We will cover three levels of analytic complexity: (1) the tech- 
nology of a single output and multiple inputs; (2) the problem of 
introducing tastes in the multiple output, multiple input case with no 
joint products and correctly measured inputs; and (3) the problems 
of classroom organization in the case of multiple inputs and outputs 
with joint production and/or improper measurement of inputs. For 
each of these categories, there are two levels of observation—the 
individual student and groups or classes of students. We assume at 
the outset that our model is deterministic, although we will have a 
bit to say about stochastic elements of the problem including un- 
certainty, disequilibrium, and person-specific “inefficiency” of pro- 
duction. At the appropriate points we will cite the relevant literature, 
though we make no attempt to survey the literature for its own sake 
since others have recently done that.! 

Consider first the case of the simple textbook production func- 
tion, or what Keeney and Raiffa (1976) call the “simple value prob- 
lem,” where there is only one output and many inputs. Most of the 
previous literature has concentrated on this case. Researchers have 


astic structure of such functions. Unfortunately, this case is most 
clearly relevant to the education of a single child in a single subject 


by a tutor. Its relevance to modern schooling in America frankly 
escapes us. 
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Student output data in terms of this model. The analysis aw em 
these studies may suffer from serious measurement error jer ^ 
These problems are not solved by using data on eee aie. E ven 
If the model, particularly the separability assumptions abou S 3 E 
learning curves, is accepted, we gm и how to do a time allo 
i ight yield useful results. T" 
rom peo [o assumption of separable individual Ex 
duction functions is that it assumes away all the interesting class 
Organization and mangement problems, since it makes no difference 
how students are grouped or tracked. There is still, of d x 
interesting resource allocation problem in such E een dim 
down to a problem of how to schedule the application o TY : 
inputs to the particular students. In thinking about this, we те 
found it instructive to consider the case of the job shop model о 
Production. This model is the polar case from an assembly line. 
In an assembly line, each item produced goes through es е 
Sequence of processes (e.g., automobile production). In a јо а ор, 
each item produced may, depending upon the requirements, La 
through different processes in different sequences (e.g., an auto- 
mobile repair shop). These models are too complex for analytical 
Solutions, but they do lend themselves to numerical solutions and 
Could provide the basis for simulation models and testing places 
ОГ various rules of thumb about teaching strategies. With some 
Simplification, models that contain many of the same attractive 
atures can be handled as dynamic programming problems. | 
The interesting problems of classroom organization do not arise 
Until we turn to case three—multiple outputs with joint production. 
9 will present a theoretical discussion of the meaning and signifi- 
cance of productive jointness in the multiproduct case. Jointness, 
Toughly Speaking, is an interdependence in the learning curves of 
individua] students, a sharing of the same inputs. We cannot assign 
inputs to outputs in such cases, and this is why it is often indis- 
"nguishable from the case where we have difficulty measuring the 
input applied to a particular output. In one case the measurement 
© theoretically impossible, and in the other it is impractically diffi- 
Cult. Each Student's learning curve may vary with, among other 
E ings, how the student is grouped with others having particular 
Characteristics. We illustrate the inherent difficult of specifying 
measuring inputs to individual students when jointness obtains. 
псе jointness and multiple outputs are admitted, it is the intra- 
classroom allocation problems that become most important. An 
™Portant and neglected issue is how such allocations can be 
achieved when students are not passive about their assignments. We 
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develop these notions in terms of a discussion of the student's supply 
of effort. We conclude with a brief discussion of the implications of 
these notions for the design of our current research. 


SIMPLE PRODUCTION 


The economist's concept of the production function forms the basis 
of the economic theory of the firm. The function is a summary 
description of the physical realities of production processes where 
inputs or resources are turned into outputs or final products. The 
textbook version of production theory usually begins with the case 
of a single output being manufactured from a single input. As a 
pedagogical device, this enables the student to become familiar with 
some elementary jargon that will be useful in talking about more 
complicated cases? Suppose y — f(x), in which x is a quantity of 
some input and y the quantity of output. Economists always assume 
that the function yields the largest value of y obtainable from some 
x, given the state of the art in production. Of course, there are many 
reasons why there may be inefficiency in production or a failure to 
get the most output from a given input, so we may never observe 
the values of x and y in the function. This is a problem that has been 
neglected for the most part in both the educational and economic 
literature. But see Levin (1974), and Schmidt, Aigner, and Lovell 
(1977) for examples of work that takes the problem seriously. 

What does y — f(x) look like? The function, called the total 
product curve, is usually assumed to be continuous, differentiable 
at least twice, nondecreasing, and to satisfy f(0) = 0. The first 
derivative of f(x), dy/dx or the slope of the total 
the marginal product of x 
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y= f(x) 


y (OUTPUT) 


x ( INPUT) 


Ф 


dy (MARGINAL PRODUCT) 


>< 
го 


x (INPUT) 


Fi 
‘gure 2-1, Total and marginal product curves. 


ud at this point is that the basic theoretical notion rquires both 
E and output homogeneity. 

9 could use total product curves like those in Fi E 
В the Process of learning through time. If the bu 
düt unt of time a student spends in a particular activity and the 

PD is the amount learned, the relation between them can be 
beat, ed using the production function notion. In fact, the relati 
ably looks very much like Figure 2-1, including the idea oak 


Minishj 
nishing returns. But time alone does not produce learning output 
ut. 
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Other inputs are always present in the production process. Some of 
these inputs may be variable for the problem we consider and some 
not. Introducing additional inputs presents no serious complication, 
but it does increase the amount of jargon we need to discuss the 
issues. Suppose we assume there are two inputs and a single output: 
У = f(xı, x2), where xı and x, are inputs measured, as before, in 
their natural units. By holding x; constant, we can vary x, and find 
the marginal product of input 1, MP, . The law of diminishing returns 
is again assumed to hold. But with two variable inputs, we can ask 
about the extent to which they can be substituted for each other in 
production. More precisely, how much of input 2 does it take to 
make up for the loss of a small amount of input 1? The answer is 
dubbed the “marginal rate of substitution” (MRS) between the 
inputs and can be readily shown to equal the ratio of the marginal 
products of the inputs (e.g., Ferguson and Gould, 1975). A set of 
points relating quantities of the inputs for which output is constant 
is called an isoquant. Several isoquants are shown in Figure 2-2. 
As we move along an isoquant in the direction of using more x», the 


x INPUT 2 ) 


х (INPUT |) 


Isoquants for а Process that produces one output from two inputs 


Figure 2-2. 
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marginal rate of substitution or the absolute value of the slope of the 
isoquant increases. That is, the isoquants are convex when viewed 
from the origin. Convexity means that as you use less and less of an 
input it becomes more and more difficult to substitute for it. Put 
another way, it is easier to produce with mixture of productive inputs. 

While they need not do so, isoquants are usually assumed to fill 
up the positive quadrant of Figure 2-2. The more northeasterly an 
isoquant, the larger the quantity of output associated with it. 

To illustrate the kind of education problems for which this may 
be useful, suppose that our inputs are the daily amounts of time 
spent being tutored by a teacher and time spent doing seatwork.* 
Output is the amount of some skill acquired. The isoquants of Fig- 
ure 2-2 are probably a pretty good representation of the way sub- 
stitution takes place. Substitution is possible, but the activities are 
not perfect substitutes. More of either tutoring or studying in- 
creases learning, however. 


A straightforward question to ask here is: If a fixed amount of 


time, Т, is available, how should it be allocated between tutoring and 
seatwork if we want to maximize learning? If we denote by T, and 
T, the time spent on seatwork and tutoring, respectively, then any 
allocation that does not waste time must satisfy T — T, + T, or 
T, = T — T,. For a given T, this is a linear relation between T, and Т, 
and appears as the straight line in Figure 2-3. If we superimpose on 
Figure 2-3 our isoquants from Figure 2-2, we can see that the opti- 
mal allocation is at point A, where the isoquant y* is tangent to our 
time constraint line. That is the highest isoquant that could be 
reached without exceeding the constraint. Since the slope of the con- 
straint is —1 at that tangency, the marginal rate of substitution 
(minus the slope of the isoquant) must equal +1. And since the MRS 
is the same as МР,/МР,, our optimizing condition is simply that the 
marginal products of the two activities should be equal and that the 
time allocation should be adjusted to accomplish this. 

If we knew the parameters of the production function, we would 
be in a position to judge whether a particular time allocation was 
optimal or not. We could also put any widely used rules of thumb 
to the test and see how closely they approximate the optimal alloca- 


tion. These are interesting questions but are not ones that a person 
k or to be much interested in answer- 


studying teaching is likely to as 
ing. That is because the technology of production suggested by this 
model stands in such stark contrast to the real world of teaching, in 
which students are taught many subjects, sometimes at once, and 
in which pupils of very different backgrounds and abilities are taught 


together in classrooms, not in one-on-one tutorials. 
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Ts (STUDY TIME ) 


Y 
| Y 
= 

t 
T,( TUTORING TIME ) 


Figure 2-3. A learning maximizing time allocation for a student in a single 
subject. 


It is worth noting, however, that this single output, multiple input 
model has been widely applied to data generated from actual class- 
room activities. These studies, many of which are discussed by 
Hanushek (1978) and Lau (1978), are remarkable for their use of 
various kinds of aggregation techniques to get rid of the problems 
posed by the inherent diversity of subject matters and students. 
Subject matters are aggregated into general math or reading achieve- 
ment scores for measures of individual student outputs (something 
we do not find particularly objectionable). Students are aggregated 
so that class average scores are sometimes used to measure output, a 
serious difficulty that we examine later. Furthermore, the input 
measures used may often have only a vague relationship to the actual 
inputs applied to the students whose scores are being predicted. 


PRODUCTION WITH MU LTIPLE OUTPUTS 


To see why these problems are im 


portant and why all studies (including 
our own) have failed in some 


important way to isolate and measure 
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accurately the input productivities, we must develop the production 
model in such a way as to approximate more closely the world in 
which instruction takes place—namely, in classes in schools. 

Schools are places where groups of students are brought together 
for the purpose of accomplishing some activities. Characteristically, 
students are placed in subgroups within schools (usually called 
classes) and may be subdivided further and remixed for different 
activities in an almost infinite variety. We shall assume that in a 
school, or in a subgroup within it, the activities and accomplish- 
ments of each student count for the school decisionmakers. They 
are educating all of the students, though not necessarily equally. 
This is a remarkable assumption, for it forces us to rethink the 
simple one output, many input production model presented above. 
If thinking of a school as a firm with a production function is a use- 
ful analogy, we must concede that a school is certainly a multi- 
product firm, because it has many students. 

It turns out that describing the production technology for a 
multiproduct firm is not nearly so simple as it is for the single 


output, multiple input case. The most difficult cases, both for 


theory and for measurement, arise when the several outputs of a firm 
are not independently produced but are technically related to 


each other in some fashion. This is too bad, because these cases are 
also the ones we think most relevant and interesting for under- 
standing how schools work. 

Our discussion of the multiproduct (many student) case has two 
parts. The first is production of a single attribute in students when 
there is independence among students. The second part deals with 


joint production. 
Having more than one product or student need not present any 
serious theoretical problems. In the case of two students, we might 


have 
yı = Кі), and 


уз = &(i2). (2.1) 


where y, is the first student's learning of some skill and t, the 
amount of input, t, applied to student 1. We define ya and tz simi- 
larly, and the forms of the functions may be different for the two 
students, perhaps because they have different abilities. In addition, 


we must have 


T-2t + te, (2.2) 
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where Т is the total amount of some input available to both students. 
It may be the total teacher time available for instruction or the 
amount of some other factor that can be divided between the stu- 
dents. Equation (2.2) says that the total amount of the input is the 
sum of the amounts applied to the students. Put another way, in- 
creasing the input applied to student 1 by one unit, given the total 
input available, implies decreasing the amount applied to student 2 
by one unit. It is this condition that gives us separability in produc- 
tion. If our t’s are really teacher's time, we have described a class- 
room technology, albeit one with only two students, where instruc- 
tion takes place by the tutorial method. The students need not be 
physically together in a school; they could just as well be in their 
own homes, with the teacher coming to them. The students do not 
interact in any way. 

We are in a position to ask a question for this case that we could 
not ask before—indeed, one which the earlier production model pre- 
cluded us from asking. How should the teacher's time (or any other 
resource) be allocated between the two students? Before, even when 
there were different activities, we had only the problem of maxi- 
mizing one student's learning. This problem is an order of magnitude 
more difficult. Economists have devised two ways of solving it, 
which we might call the market solution and the utility function 
Solution. Both involve techniques for comparing the worth of addi- 
tional units of achievement for student l with that of additional 
units for student 2. The techniques differ only in where they find 
the source of the values. The market solution has been applied to 
firms that sell their products. The prices they can get serve as aggre- 
gation weights for valuing the different outputs. In our example, 
with a fixed amount of some input, the objective of a firm might 
be to maximize its total sales receipts, R. The problem is to: 


maximize: R = P,y, + Ру, 

subject to: y, = f(t,), 

Yo = g(te), 

T =t+t 

If the prices of the outputs, Р, and P5, are known and fixed, this is a 
straightforward problem. In fact у, and У, should be chosen, so that 


Р, /Р» = (dg/ato )/(8f/8t ), where 98/0t» and df/a tı are the marginal 
products of time in Y» and y, respectively. 
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While the revenue maximization problem can be solved directly 
by using standard techniques (for a multitude of examples, see 
Henderson and Quandt, 1971), a somewhat different way of looking 
at it will be especially helpful. Pretend the problem has two parts. 
The first part consists of finding all the output combinations (y1, y2) 
Obtainable for a given total input level, T. We are particularly inter- 
ested in the combinations for which we have the maximum output 


binations that Ta We 
[n M. NN 
К Пау 4 
maximize: y; = КА), e 2 
DR 
subject to: у» = &(t2), х <> "E 
- T Calcutta 5 . 
T = tj + fe. Zu A 
Re 


Since there are only two outputs here and one input that is al- 
located to the two goods, the problem is trivial. Once a level of y» 
is chosen, there are no extra degrees of freedom in the system; the 
only solution is the optimal solution. This solution consists of an 
equation of the type уу = #92 T), which in economists jargon 
is called a product transformation curve or a production possi- 
bilities curve. Typical examples of such curves are shown in Figure 
2-4. CC' contains all the output combinations for a given evel of 
resources, while DD' is the same curve with a greater resource en- 
dowment. E 

If y, = h(y, T) repre 
along DD’, we can once again a 


sents the possible combinations we can get. 
sk the earlier question, Which (y1, У2) 
combination will maximize total revenue (А = Ру + Poy2)? The 
(Yı, уз) combinations for a given level of revenue are хараа ~ 
output space, у; = R/P2 — (P1P2)y1- The trick is to find the "iso 

revenue line" that maximizes R (i.e., has the greatest intercept) con- 
Sistent with the production possibilities curve. As Figure v rib 
this is accomplished for the isorevenue line EE which is jus e 
to the production possibilities curve. For ver m deme a 
slope of the isorevenue line, Р; /P must equal ei s зра = р 
duction possibilities curve, dys/dyi. But from the E edu 
problem we examined earlier, we can see that pied rupes ae 
—MP, /МР» , the ratio of the marginal products О a т. 
outputs. The numerical value of the slope of fa pro - = possi 
bilities curve is usually called the (marginal) ra ima we pena 
formation (RPT) and is the amount of one outpu n d by aie 
up one unit of another. The RPT is like a production rate 


64 Issues in Microanalysis 


о 


yo OUTPUT 2) 
e 


(ek Dp! E' 
y (OUTPUT | ) 


Figure 2-4. Production possibilities curves for two outputs and different levels 
of input use. E£' is an isorevenue curve. 


and plays a crucial role in the theoretical development that follows. 

The whole point of this analysis was to show that the original 
maximization problem could be broken up into two subproblems. 
Solving each subproblem in order would yield the same solution as 
the original. The division of the problem allows us to construct the 
production possibilities curve that reveals the alternative outputs 
we could have. Revenue maximization gave one answer to the 
question of what we ought to have but relied on the presence of 
market prices. 

The problem of which out 


put combination to choose for classes 
of students is devoid of mark 


et prices. However, the alternative ways 
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to allocate time or resources are still present. This brings us to a step 
in the analysis of production often overlooked by other researchers— 
the necessary introduction of nonmarket values. It is less important 
to us whose values are brought to bear to choose a point on the pro- 
duction possibilities curve than it is to realize that we cannot escape 
a value judgment in this case. It is this above all else that separates 
the single output from the multiple output case—the need to make a 
nontrivial value judgement to find an optimal mix of outputs. 


THE UTILITY FUNCTION 


Economists solve the problem of subjective value, as opposed to 
market value, by postulating a utility function. This is simply a rule 
that assigns a unique level of satisfaction or utility to every com- 
bination of goods or services a person might consume. Each in- 
dividual is assumed to have a utility function that may or may not 
be similar to other people’s. Such functions are maps embodying 
information about people’s preferences. k 
Consider the utility function for some person trying to decide 
how to allocate resources to the children whose possible learning 
outcomes are shown in Figure 2-4. More learning for each might 
reasonably be a good thing. But our decisionmaker might be willing 
to trade some learning by one of the students in exchange for a large 
enough increase in learning by the other. 
These basic notions about tastes or th 
tion are often described by economists using the device of the in- 
difference curve. If our decisionmaker's utility function is U = 
О(уі, у»), an indifference curve is a locus of combinations of yi and 
y» for which the level of satisfaction, U, is constant. Several indiffer- 
ence curves are shown in Figure 2-5. Higher indifference curves, 
those denoting higher levels of satisfaction, lie above and to the right 
of other curves. The curves are negatively sloped and convex when 
viewed from the origin in order to denote the increasingly difficult 
substitution possibilities as people have a great deal of one com- 
modity and little of another. The convexity of indifference curves 
is a way of bringing in the idea that people value variety (see Fair, 
wie simple example of two outcomes, it is easy to see which 
alternative mue if we want to maximize total utility. Simply 
choose the outcome mix for which the production possibilities curve 
is tangent to an indifference curve. If the production possibilities 
curve and the indifference curves have the slopes illustrated here, 


there will be a unique optimal outcome. 
Economists, unlike psychologists, have not devoted much effort 


e shape of the utility func- 
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Figure 2-5. Typical indifference curves for a decisionmaker with two goods. 


to explicit estimation of utility functions. Since the only safe as- 
sumption is that everyone's function is different, some sort of lab- 
oratory experiments have been thought to be necessary to reveal 
preferences. Maybe there is the prospect of some progress in this 
work along the lines suggested by Keeney and Raiffa (1976). To our 
knowledge, the only serious attempt to apply their notions to 
schools (although mostly at the school or district level) is Roche's 
(1971). 

But if tastes are not readily measurable (see Klitgaard [1975] for 
a particularly pessimistic view), why do we raise the notion here? 
Because it makes clear that choices among alternatives cannot be 
escaped in the multiple output case and that not all alternatives are 
likely to be equally desirable. It is introduced also to show that 
optimizing behavior is a useful framework for analyzing outcomes. 
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If some outcomes are better than others, does a best outcome exist, 
and what is it? This is standard material for economists, and webeg 
their forbearance if any of them plodded through the preceding 
pages. It was a mild surprise that in the course of our research, as we 
Spoke with more and more teachers, psychologists, and educators, we 
found the economic model of optimization under constraints greeted 
with either tenacious skepticism or unbridled admiration. One of our 
important tasks is to try to convince those in other disciplines of the 
virtues of the economic methodology while at the same time learn- 
ing from them the limitations of the approach for education prob- 
lems. But if the methodology is to be used, the assumptions must be 
clearly understood.5 


EMPIRICAL IMPLICATIONS 


Before advancing to the case of joint production in a multiproduct 
case, we explore the implications for measurement of the cases we 
have covered so far. It seems that the questions that are always at the 
center of empirical research are whether there are any inputs that 
make a difference in determining outputs, and if there are, how “im- 
portant” are they. Recent summaries of the literature of these 
questions do not point to conclusions any different from those of a 
few years ago. In short, the only inputs that consistently seem to 
make any difference to learning outcomes are students socioeco- 
nomic and racial background measures. Hanushek writes: 


First, almost uniformly, educational production models show eene 
Sistent or significant relationship between achievement ш a acond 
рег pupil (either instructional expenditure or total expen (one em е 
analyses of specific purchased inputs (teacher vost im show a 
tion levels, class size, and administrative/supervisory expendi 


similar lack of relationship. (1978:47) 


Lau (1978:11-13) stated that there was no соны о, 
between cognitive achievement and such variables а © 2: E ded 
ет quality, and teacher attitudes. He does allude ip ups i ues 
that found a relationship between scores and E in dens Ше 
Studies cited use time measures that are о dps ная 
microeconomic time-on-task notions that we пка co 
priate. Instead, the time variables usually m ides. ‚д, 
length of school day, or length of school year. (Se 


Of one of these below.) Lau concludes: 
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None of these rather general conclusions constitutes a startling ee 
of any findings of earlier surveys. In fact the conclusion of no consisten 
observed relationship between cognitive achievement and school мала 
appears to hold quite well. The only relatively new addition is perhaps t e 
observed importance of student time inputs in education and production 
for which a substantial amount of evidence has been accumulated over the 
past few years. (1978:13) 


Why might purchased inputs seem to have no effect on learning? 
After all, if such were really the case, schools would be entirely ir- 
rational institutions from an economic perspective. The best explana- 
tions of these empirical regularities show them to be the result of 
using the wrong theoretical model of production—one output pro- 
duced by many inputs. The correct production model to use in em- 
pirical estimation is not the single output version but a multiple 
output system. The absence of observed productivity effects comes 
from omitting learning outcomes other than cognitive skills. In one 
version of the critique, the additional outputs are other student traits 
such as affective qualities (Gintis, 1971; Bowles and Gintis, 1976; 
Brown, 1972; or Leekley, 1974). In another version, the neglected 
outputs are the many students in a class, each of whose learning out- 
comes is assumed to matter. This implies that it is incorrect to aggre- 
gate students in classrooms by taking the mean score or learning level 
as the measure of output (Klitgaard, 1975; Brown and Saks, 1975a). 
Indeed, we (1975a) have shown that using a single output model pro- 
vides the opposite conclusions about the productivity of purchased 
teacher inputs than does using the multiple output model on exactly 
the same data. The multiple output model does show that purchased 
inputs are productive. 

The production possibilities curve makes it clear why neglecting 
the presence of multiple outputs leads to problems. Figure 2-6 shows 
two product transformation curves for two different levels of input 
use. We know inputs are productive—that is, have positive marginal 
products—because DD', the curve representing outputs with the high- 
er input level, lies above and to the right of CC'. The two outputs 
measured along the axes may be either the scores of different stu- 
dents or the scores of a single student on two different learning out- 
comes. Assume for a moment the former case and that each student’s 
score matters so that it is efficient to be on the curve CC' or DD' 
rather than "inside" it. Suppose the resource allocator in the case 
of fewer resources chooses to be at point Q, while with increased 
inputs, R is chosen. If we take the mean score of the two students 
as our measure of output, we find that the average score is lower at 
R than at Q, and we might be tempted to conclude that the marginal 
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Figure 2-6. Choices between two outputs when the quantity of an input 
changes, 


gative. The 45 degree line ММ' through Q 
is a locus of points for which the mean score is constant. Since R lies 
below MM’, we must have а lower mean score. Thus, as Eis move to 
a higher production possibility curve because we have E more pro- 
ductive inputs, the mean output falls! The usual € ош 
model would conclude that the inputs were not productive. By a 
sumptio not the case here. | | 
үөр ен d the figure represent different pin н 
for а particular pupil, say cognitive and affective dh о жайа 
matics and foreign language scores, We confront a P es P ~ 
As resource increase, the production of output "^ e оса e 
because of unproductive inputs but because of the s ers ie a 
tastes. A statistical model that regresses output hee rie 
identi mome киң eise 75 be pe e? First p^ must 
identifying input productivities, what is to be done? У 


product of the input is пе 
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try to include all the outputs we can in our model. This is extremely 
difficult because on the one hand many are in principle not easily 
measurable (e.g., personality traits) and because on the other hand 
the number of output measures may be unmanageably large. We as 
economists do not really have much guidance to offer on this issue. 
Certainly the problem occurs in all studies outside educational pro- 
duction and is handled after a fashion or assumed away. The test 
for success or failure seems to be whether other researchers in the 
field believe the results and the model. Second, many researchers 
have suggested that both model and data be analyzed at the level 
of the individual student. Hanushek (1978) seems to take this for 


granted throughout his paper, and the models he presents are all in 
terms of the individual student. Lau writes: 


This lack of identification of the true relationship of educational pro- 
duction is a problem which plagues a substantial proportion of empirical 
studies of education production. In principle, this problem is not in- 
surmountable. It does require, however, the collection of detailed student- 
specific data on the quantities of variable inputs including possibly de- 
tailed time budgets of the actors, and a concomitant analysis of the 
behavioral patterns of the actors in the educational process. (1978:21) 


Murnane, in his conclusions about earlier research, lists first among 
the “lessons that have been learned" that “һе unit of observation 
should be the individual child since intraschool variance in achieve- 
ment is much larger than interschool variances" (1975:25). 

Finally, Summers and Wolfe note that “Past attempts at esti- 
mating [production functions] have represented many inputs by 
school- or district-wide averages, rather than by the more appropriate 
pupils-specific data . . . . We conclude that the empirical investiga- 
tions have failed to find potent school effects because the aggregative 
nature of the data used disguised the school's true impact” (1977: 
39-40). They feel that “the use of pupil-specific data, and statistical 
methods appropriate to such data, account for the cheerier results 
of [their] study." 

In the paragraphs that follow, we will explore the data require- 
ments of the multiple output (i.e., many student) model for analysis 
at the individual student level. Then we will analyze some of the 
studies that purported to satisfy the disaggregation requirements of 
the model. We do this because we believe it is important to realize 
that individual data will not in general solve these problems. 


The multiple output model, when there was no jointness in pro- 
duction, was written in a simple form: 
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yi = КА), 
Уг = &(t2), 
T=t + te. 


The assumption of two inputs and two outputs will not restrict us. 
The relevant group here is the two students together who “share” 
a single input quantity, T. Increasing the amount of t given to one 
of them is something that is, first of all, presumed to be measurable 
and that, second, decreases the amount available to the other by 
the same amount. The data requirements for such a model are for 
input amounts received by each student, as well as for their output 
performance (or value added, see Summers and Wolfe [1977] ). We 
show that it is not sufficient to have only input data (T) for the 
group, even if we have observations on yı and y2. 

All studies using individual student output use classroom level 
inputs as the lowest level of aggregation." That is, T is used in the 
production equations for y, and у; instead of tı and tg. Figure 2-7 
shows two production possibilities curves for the learning outcomes 


y| 


Figure 2-7. Two production possibilities curves for scores of two students. 


72 Issues in Microanalysis 


y 


True Learning 


J 
| Curve 


Estimated 
Learning Curve 


T T" T 


Figure 2-8. The effects of using average input instead of pupil-specific data. 


of two students for two different levels of T. In the best of worlds 
the students would have identical learning curves, so let us assume 
that they do. The learning curve appears in Figure 2-8. Can we dis- 
cover the shape of the learning curve by looking at the relationship 
between the individual student outcomes and average input to the 
two students? The answer, in general, is no, because unless T' is per- 
fectly correlated with both t; and t5, there will be a bias in the esti- 
mates of the production parameters. When the input level is at Т”, 
production takes place at A and the learning outcomes are y$ and 
yg. If we assumed the students each receive the average input Т'/2 
instead of the true value, we will have the data points e and f 
in Figure 2-8. When T rises to T" we move to point B in Figure 
2-7. Again assigning each student the average input, T''/2, we get 
data points g and h. A hypothetical estimate of the learning curve 
based on these four points is shown in Figure 2-8. While it is posi- 
tively sloped, it does not correspond to the true learning curve and 
indeed seems to underestimate the marginal productivity of the 
input. Again we have the entangling of technology and tastes. We 
cannot isolate the production effects of a change in an input. We 
have not been able to disaggregate sufficiently. Lau (1978) at least 
understands that in this model the problem is solvable, in principle; 
but no study to date has been able to secure appropriate data. 


Production Technologies and Resource Allocations within Schools 73 


A CRITIQUE OF THREE NOTEWORTHY 
RECENT STUDIES 


We would now like to look in detail at a few recent studies that have 
attracted some attention and that presumably define the state of the 
art. Summers and Wolfe (1977) is among the very best of these. They 
use a single equation, single output, many input model to try to dis- 
cover which inputs matter for student achievement over a three year 
period from third to sixth grade. Four kinds of explanatory variables 
(measured for grade 6) are employed: (1) genetic and socioeconomic 
characteristics of the student, (2) teacher quality variables, (3) other 
nonteacher school quality variables, and (4) some peer group char- 
acteristics. All these variables are stated to be pupil-specific for their 
627 students from 103 Philadelphia elementary schools. The statisti- 
cal analyses in their paper make liberal use of interaction terms in- 
volving the right-hand side variables to search for interesting non- 
linearities in the results. 

The results of Summers and Wolfe are claimed to contrast clearly 
with most earlier work. First, they find inputs that do seem to 
matter for achievement. A related result of interest, suggested by our 
earlier model (Brown and Saks 19752), is that different inputs have 
different effects with different pupils. They attribute their “сһеегіег” 
results to the pupil-specific data set they collected. 

What is the Summers and Wolfe model? It is the one output 
(achievement) production function model that has been used in most 
Studies. They say that the interpretation of their input-output equa- 
tion should not be as a production function, citing the presence of 
inputs that cannot be altered by decisionmakers and the assumption 
that in theory such a function relates maximum attainable output 
to inputs. However, including fixed inputs in the production equa- 
tion in no way undermines the notion of a production function or 
“blurs the distinction between the variables which the educational 
policymaker can control... and those which he cannot... (1977: 
639). Indeed, the standard textbook analysis of the short run in 


n i iust such a distinction. | 
ins ani Wf important point in noting that 


Summers and Wolfe make an 1 i 
schools are not perfectly efficient so that the points we observe do 


not lie on the frontier. The implication of this is that the empirical 
estimates really describe a kind of average rather than Leon = 
ficiency. But that seems to us to be as much a problem wi T 
estimation techniques that have been used as with how E оша 

to interpret the results. They conclude: “Tt seems aria 1 wa 
fore, to view (1) [their achievement equation] as a simple input- 
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output relationship.” But if a “single input output relationship” is 
not a production function, estimated with biases perhaps, we do not 
know how to interpret it. 

Since allocated time is one of the key elements of school pro- 
duction, it was useful for them to try to include such a concept. 
Unfortunately, their model does not include the time students spend 
on learning in an important way. It is true that they measure the 
effects of attendance (latenesses and unexcused absences) and ex- 
amine the impact of ‘‘disruptive incidents." But these are only 
approximate and, incidentally, very aggregative measures of the time 
pupils spend trying to learn, Presumably these data, if available, did 
not show any consistent impact on learning or they would have been 
included in the final results. This leads us to suspect that unexcused 
absences and possibly latenesses are not really picking up the effect 
of time lost but are a proxy for some socioeconomic characteristic. 
Summers and Wolfe do in fact identify these variables as proxies for 
“motivation of students” rather than as a production effect of 
changing time on task (1977:642). 

What can we say about their other pupil-specific input measures? 
These consist of a teacher’s college quality rating, teacher experience, 
class size, and the teacher’s score on a national teachers’ exam. All 
these variables seemed to have a significant effect on learning, but 
after all, that is why they ended up in the final results. 

The difficulty we have with these results is that the input data are 
not pupil-specific but are instead classroom-specific. The variables on 
teacher quality and class size are measures of classroom resources, 
not the specific amounts of inputs made available to specific stu- 
dents. The production model with many students requires that out- 
put data for each student be matched with the inputs to that stu- 
dent. The inputs in a classroom may be a better measure of resources 
received by a student than those in the student’s school or school 
district. But the disaggregation in the Summers and Wolfe model is 
not complete on the input side. So we still have the problem of the 
interaction between technology and tastes. For example, "Teachers 
who received B.A.’s from higher rated colleges were associated with 
students whose learning rate was greater—and it was students from 
lower income families who benefited most” (1977:644). Were the 
teachers from the higher-rated colleges more productive than other 
teachers? Or did they allocate their and their students’ time and 
other classroom resources differently. It may, after all, be that the 
teachers from the higher rated colleges have different values with 
respect to the distribution of learning outcomes and as a result use 
different classroom management strategies. But because we do not 
observe the amount of input each pupil gets, but only what the class 
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gets, we are unable to disentangle the technology from the values 
applied in classroom organization. 

Teacher experience clearly affects different students differently in 
Summers and Wolfe's results. Higher achieving students seem to 
benefit more from experience and low achieving students seemed 
adversely affected. But again, it is unclear whether we are experi- 
encing the effects of tastes or technology. It is possible, but not 
determinable in this study, that experience makes teachers more 
effective with low achieving kids, but that more experienced teachers 
Systematically see to it that more time and resources get allocated to 
high achievers. We simply do not have the evidence about teachers' 
or other administrators’ values about these things. 

Summers and Wolfe's results on the effects of class size also reveal 

different effects for different kinds of students. Low achieving 
students seem to do worse in very large classes while high achieving 
Students do better. Class size, as it relates to how classes are or- 
ganized and time allocated, is an interesting variable. As more chil- 
dren are under the supervision of one teacher, the control problems 
multiply. Some teachers may see larger classes as an imperative for 
more sameness in student activities—large group lecturing, study hall, 
and so forth. Others may see it as raising the value of individual 
attention to students, particularly in a class where peers are a very 
heterogeneous group. But without explicit attention to the or- 
Zanizational factors and the resulting time allocations, we must 
withhold judgment on the effects of changing the class size var- 
іаЫе. 
Peer group effects seem to be important to achievement. AII 
Students seem to benefit most from having 40 to 60 percent black 
Students in the school. An increasing the percentage of high achiev- 
ing students seems to benefit low achieving students particularly. We 
can tell stories about why these results might appear. These stories 
Would probably have to do with how the students are mixed together 
in subgroups within a class and how they interact. If the ponte” 
cleverly chosen, the bright students might teach the not so bright. 


i bright 
th differently, but no less cleverly, the 
te ee este cai d the teacher would concentrate 


Students might help themselves, an sonci ; 
on the not on bright. Summers and Wolfe do not tell stories like this 
because their model and data are not designed to discriminate be- 
tween them. Even though they have data on achievement for indi- 
vidual students, their input data are still a less so than in 
i е і t student-specific. | 
кереу us Its come from their interaction 


Perhaps the most interesting resu 
analysis of inputs. As we suggested in 1975 (Brown and Saks, 1975a, 


1975b), there has been a near total neglect of the rather obvious fact 
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that schools have very different effects on different students. This 
means that how a given amount of resources is organized and spread 
out over the students can have a rather dramatic effect on the 
standard measures of output, aggregative or not. While Summers and 
Wolfe may not have had any more success than we did is isolating 
the productivity from the allocative effects, they point to an im- 
portant pathway for research. We must try to determine the differ- 
ences in the productivities across students of different kinds of 
inputs and the effects of peer group interactions on those productivi- 
ties. The measurement and analysis of time allocations would seem 
to be a promising way to proceed, particularly in terms of the 
theoretical models that we have developed so far. But life is more 
complicated than we have been willing to admit, since we have as- 
sumed independence of the student learning curves as a function of 
time. After we discuss some other recent studies of achievement, we 
will develop a model of joint production to study the implications of 
that realistic extension of the learning model. 

Ritzen and Winkler (1977) try to uncover the differences in the 
productivity of school inputs over time and for “advantaged” and 
"disadvantaged" children. Because productivities are likely to vary 
both over students and over time, allocating resources is not likely 
to be a trivial matter if one wants to optimize the total amount of 
human capital obtainable from given resources. Ritzen and Winkler 
most certainly equate learning as measured by achievement tests with 
human capital—that personal quality associated with higher pro- 
ductivity in labor markets. They state that “new learning by students 
in a given period of time is a function of their current existing human 
capital stocks and the current flow of inputs" (1977:428). Then, 
“Human capital is measured in this paper by percentile scores on 
standardized examinations of cognitive learning" (1977:430). In- 
deed, they use scores on IQ and verbal achievement tests. One reason 
they might prefer to identify their study as one of human capital 
rather than simply cognitive learning is a desire to associate them- 
selves with the school of thought that identifies the productivity of 
education as residing in skills and their assumed direct relation to 
job performance and income. For a contrasting view, one might see 
Bowles and Gintis (1976) or Gintis (1971). 

The Ritzen and Winkler model is one in which the current learn- 
ing level is a function of the past period's learning level, current 
home inputs, and current (purchased) school inputs. Learning can 
depreciate from one period to another, home inputs are assumed 
constant from year to year (a restriction imposed by the data), and 
the productivity of a given amount of school inputs is allowed to 
vary depending on when a student receives it. 
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Their sample consists of 669 students from a single school district 
who completed eighth grade in 1965. They report that 356 of the 
Students are black and 313 white. This is important, since they say 
that race may serve as a proxy for income in their estimates, noting 
that the difference between black and white mean family incomes 
differed by almost $2,500 ““їп the geographic area included in the 
school district under study" (1977:429). 

Their measure of home inputs is an index of the number of cul- 
tural items in the home, a variable indicating homeownership, years 
of education of mother and father, and finally, the number of sib- 


lings in the home. 
The treatment of school inputs in the model is best put in their 


own words. 


The quantity of instructional services provided in the school can be as- 
sumed to be a function of the capital and labor in the school. Capital 
includes physical capital such as books, laboratory equipment, special 
instructional aids, etc., and human capital of the teachers, administra- 
tors, and other personnel. At the micro level, labor in the school refers 
to the quantity of time a teacher allocates to a given child and the pupil's 


own work effort. Teacher time may be related to class size. | 
While it would be desirable to have direct measures of capital and labor 
in the school, we proxy these purchased inputs by current real expendi- 


tures per pupil. (1977:431) 


The expenditure variable was constructed from information on 
class sizes, salaries of school personnel, and student academic 
records. What is clear from the discussion is that the input variable 
is at best classroom-specific at the elementary level, and they say 
it is track-specific for the secondary (presumably junior high) 
School year(s). Since the significance of tracking in their dis- 
cussion is never made clear, except to note that there are two 
tracks in the junion high schools, it is not possible to evaluate the 


data in more detail. 
The results that Ritzen a 
elasticities of achievement wit! 


and Winkler present show increasing 
h respect to the purchased inputs as 


grade level increases. In short, it would appear that the marginal 


products of school inputs are generally positive and increase the 


fart i through the grades. 
сесара oo equations for subsamples of the 


Ri inkler rerun | | 
din са ла differences for blacks and whites. Whites 
demonstrate the pattern of increasing input productivities over 
time. Blacks, on the other hand, seem to have elasticities of test 
Scores with respect to inputs that are constant over је 55, 
the productivity of school remains about constant for blacks as 
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they proceed from grade 1 to 8. Why should this be? Ritzen and 
Winkler suggest that the difference may be due to the schools having 
different objectives for blacks and to blacks' different objectives for 
themselves: “Тһе blacks in this sample are predominantly low in- 
come, and low income pupils may not value scholastic achievement 
as highly as middle and high income pupils, perhaps because they 
perceive an unimportant relationship between such achievement and 
future earnings" (1977:434). Now we see the motivation for their 
earlier assertion that race is a proxy for income. They believe that 
they have separated the sample along income lines and that low in- 
come means a perception of a low payoff to school achievement. 
They go on to say that blacks tend to be in the vocational track 
rather than in the college preparatory track and suggest that the 
school may be attempting to maximize some variable other than 
achievement. It may, of course, also be true that the tastes and 
preferences of the school authorities operate in a more insidious 
fashion and result in fewer resources being allocated to black pupils 
in elementary school. The aggregative nature of the input variables 
carefully insures that this possibility, if indeed it occurs, will not be 
uncovered. 

While Ritzen and Winkler may realize the possibility of a con- 
founding of tastes and technology, as in the paragraph quoted above, 
they seem convinced in the end that they have estimated production 
parameters—the technology. A page of caveats does not deter them. 
They dutifully note the difficulty of generalizing their results “‘to the 
world as a whole," the difficulty of identifying human capital with 
test scores, the difficulty of measuring home and school inputs, and 
finally, the possibility that education technology may change. Yet 
they venture their policy recommendations: “Їп those cases when 
the statistically significant results obtained in the estimation of the 
model showed production elasticities which continuously increase 
with time between grades one and eight, the optimal investment 
trajectory should be one where the quantity of purchased inputs per 
pupil also increases with time" (1977:436). Thus, the time pattern of 
expenditures should be one that tends to equate marginal productivi- 
ties across years. Since with existing allocations the returns for 
whites are higher in later years, more resources should be shifted in 
that direction. But blacks have more or less constant returns over 
time, so their allocation is about optimal. 

Ritzen and Winkler's model and conclusions show a lack of sensi- 
tivity to the interactive roles of technology and values in the deter- 
mination of achievement. High returns in some grades for whites may 
be due to a failure to observe the true input quantities they receive. 
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If they in fact got more inputs than the classroom or track average 
and if the inputs they got were productive, we would overestimate 
the productivity of the observed inputs. But on the other hand, the 
high returns may be an accurate measure of marginal returns. The 
point is that we simply do not know and cannot tell from the evi- 
dence presented. The results are inconclusive, and the policy pre- 
Scriptions unjustified. 

Murnane (1975) leads us carefully, step by step, through a maze of 
data and analyses of how and whether schools affect pupils’ achieve- 
ment. In all, he tests thirty hypotheses, ranging from whether the 
classroom to which a child is assigned affects the child's achievement 
(it does) to whether black and white teachers show a different re- 
lationship between experience and teaching performance (they do 
not). 

Murnane employs the single output, many input production model 
in which a child's achievement at the end of the school year depends 
on achievement at the beginning of the year, background character- 
istics, number of school days attended, and a vector of school and 
teacher characteristics. His sample consists of 875 inner-city black 
children. 


In the first stages of analysis, he concludes that principals’ evalua- 


tions of teachers do predict performance in raising the pupils' achieve- 
ment. Class size and the peer group variables of mean and standard 
deviation of achievement in a class seemed to have little relation to 
achievement. With these preliminaries out of the way, Murnane 
poses an interesting set of questions that really go to the core of 
some of the issues we want to raise. Does perceived input produc- 
tivity vary with student achievement level? Are some teachers more 
productive with certain kinds of children? Does classroom student 
turnover adversely affect high or low achieving pupils more? These 
are questions about both the nature of the technology in a class- 
room and the tastes or values of teachers. Murnane realizes, perhaps 
better than any researcher whose work we have read, that his answers 
involve both tastes and technology, so that the regression coefficients 
in his model should not be interpreted as marginal productivities. His 


treatment of the effect of turnover is a good example. 


cially affect the progress of children 


with high initial reading achievement? One plausible yia КП 
teachers compensate for the time lost in dealing Me area үш 
by spending less time with those children who can er | : ord t on. 
namely, those children with high achievement levels. 4 oe 
correct, why does the effect not appear in the three math ites: 


Why would student turnover espe 
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may lie in differences in the way reading and math are taught in the 
primary grades. Most teachers divide their children into several reading 
ability groups and allocate time to each group. Thus, the possibility of 
spending less time with the best readers exists. Math, however, is more 
often taught to the whole class at the same time at the primary grade level. 
Thus, there is less of an opportunity to make an explicit decision to 
spend less time with “high achievers" in order to have more time for other 
children, (1975:47) 


He also tests the hypothesis that *the relatonship between princi- 
pals’ evalations and teacher performance in improving the cognitive 
skills of students is different for black teachers than it is for white 
teachers? (1975:49). This is a test of whether principals' evaluations 
predict the performance of students in the same way depending on 
whether they had black or white teachers. Indeed, it seems to make 
a difference (evaluations of blacks are poorer predictors), and Mur- 
nane offers two explanations, neither of which has anything to do 
with any possible productivity differences between the two groups of 
teachers. 

One explanation suggests that black teachers may be especially 
highly rated for their accomplishments in raising the noncognitive 
skills of students. This is consistent with Murnane's earlier statement 
that education has many goals and schools multiple outputs. An al- 
ternative is that the principals, most of whom were white, were less 
capable of evaluating black teachers because they were less familiar 
with their teaching techniques and styles. 

Murnane believes that he has identified—and he may have—some 
pure productivity effects. Most interesting is the effect of experi- 
ence on achievement scores. Experience seems to increase teacher 
productivity up to three to six years of experience. The gains from 
experience are apparently exhausted after six years and perhaps after 
as few as three. But any conclusions about other teacher character- 
istics that may affect learning must be weighed carefully. Murnane's 
data are another example of individual data on student achievement 
being joined with classroom level data or inputs, particularly teacher 
characteristics. Internal organizational factors that reflect values can 
confound the analysis and prevent us from finding input marginal 
productivities. But we feel his research represents at least the best 
link between the single output model and the more realistic and 
useful models where intraclassroom allocations are fundamental. 


JOINT PRODUCTION 


We now need to consider cases of multiple outputs where production 
of one output affects the production of another in more complex 
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ways than simply through the resource constraint. We think that the 
applicability of this model to classes where inputs get shared in com- 
plex ways is obvious. Economic analysis of joint production usually 
begins with the case of fixed proportions. Suppose some product, уу, 
is made using a single input in its production, x. We may write the 
production function as уу = f(x). Suppose also that there is another 
output, ys, that is produced along with y; in constant proportions 
without the use of any further inputs. An often used example of 
such a process is the production of hides and beef. Sometimes one of 
the outputs is called a by-product. If both of the outputs and the in- 
put have constant market prices or values, Pi, P5, and Р,, we ask 
what amount of the input, product, and by-product we should 
choose if we want to maximize profits—the difference between 


revenues and costs. That is we want to 
maximize: R = Ру + Pays — Рух, 
subject to: yı = f(x), 
ya = kyi, 


ant) factor of proportionality between the out- 


where k is the (const з ) 
the problem is to choose х, the input quantity, 


puts. The solution to 
to satisfy 
_of | AN 
Pi эх; + Pok Эх; Р; 
а up to ће point where the extra cost, of 
is equal to the extra return from buying 


wo parts, the extra receipts from 
(af/ax), and the extra receipts 


The input should be use 
buying another unit, Px, 
it. This extra return is the sum oft 
the scale of the extra yı ана | 
from the scale of the extra Уг, ^2 x). | 

While this version of the problem certainly makes clear the idea of 
jointness, there is another way of stating it that is useful for analyti- 
cal purposes. Consider the following problem, which has the same 
Solution as the one above: 


maximize: Ё = Ру: + Руз — Psx, 


subject to: Уз = f(x1), 
уг = g(x2), 


ysg = ко. 
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The best amount of x to choose must satisfy 


of. Of. „. 
Py 0x4 + Ps 0X» Px, 


which has the same interpretation as before. Use the input, x, up 
to the point where the extra cost associated with the use of another 
unit is equal to the extra benefits. The benefits have two parts, since 
extra x produces both y, and y,. Indeed, one way of looking at 
fixed proportions production is to say that each output gets the 
benefit of all the input. If the amount of input is increased to one 
product, the increase goes to all the other products as well. All 
products “consume” the extra input equally. 

Notice that this second formulation of the joint production prob- 
lem is exactly the same as our multiple output case except for the 
specification of the last constraint. Where separability in production 
was the rule, we had x = x, + хз, so that an increase of x; meant 
an equivalent reduction in хз, if total x was constant. Here an in- 
crease in x; does not reduce x». In fact, both x4 and x must rise. 

This suggests that there may be intermediate cases of jointness 
in production and that the fixed proportions and perfect sep- 
arability cases are only extremes of a more general model. The last 
input constraint in the above problems holds the key. We want for 
that constraint a function that will tell us the amount by which an 
input increase for one good must reduce input availability for other 
goods. Elsewhere we have called this the “input exhaustion con- 
straint" (Brown and Saks, 1975a), and it is the key to understanding 
where the gains from jointness arise, if they exist at all. First joint- 
ness does not exist if the input exhaustion constraint for x is simply 
the sum of the amounts of x used to produce all the goods. A quick 
test is the following: When I increase the amount of x to one out- 
put, does this show up as a decrease to all other outputs together in 
an equal amount? If the answer is no, jointness is present. 

As has been pointed out elsewhere, jointness in production has 
many formal similarities to the economic analyses of publie goods 
and externalities. A pure public good in consumption is one that is 
consumed in its entirety by everyone (Жыш = Aj, ј = 1,...n,forn 
consumers of Z). Externalities are characterized by lack of inde- 
pendence between production and/or utility functions. It is a matter 
of choice in some instances whether a particular case of productive 
interdependence is analyzed as jointness or as an externality. For ex- 
ample, electricity and smoke are produced in fixed proportions from 
coal. Or smoke is an external effect of electricity production. It 
usually matters little for the problem at hand which way we choose 
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to imagine the interdependence in production taking place. We shall 
make use of the similarities between the cases below. 

We saw that in the case of independently produced multiple out- 
puts, it was possible to construct a production possibilities curve 
showing the locus of output possibilities when all inputs are fixed 
in quantity. This is, in general, also possible in the case of joint pro- 
duction. In fact, the shapes of the production possibilities curves 
will be similar to those of the independent output case, except in 
the case of fixed proportions jointness, where the "curve" is just a 
point in output space. 

There are several important implications of the possible existence 
of jointness for understanding the economics of classrooms. The first 
is that we cannot understand the production relationship for an in- 
dividual student in a particular subject by observing the inputs that 
pupil receives. In the case of the independent production model, 
that was a proper course in principle, but measurement was diffi- 
cult. But with jointness, the individual production variables interact 
in possibly complicated ways. Measurement of individual input ap- 
plications is not difficult but is theoretically impossible. The prob- 
lems become apparent with a simple example. Let there be a class- 
room of two students, À and B, who are taught by a single teacher. 
The teacher may tutor the students individually or teach them to- 
gether as а “class” of two pupils. 

When the students are tutored, t 
functions) are 


heir learning curves (production 


La = 10 + Тат, 

Lg = 5+ 2Твт, 

10 = Tar + Tar, 

arning levels, and Tar and Твт are the 


here are ten hours available. The pro- 
gy is shown in Figure 2-9 


Where L, and Гв are the le 
tutoring times for A and B. Th: 
duction possibility curve for this technolo 


as the line segment PP’. ' | 
Consider m an altemative technology In which on ne 
grouped in a class and the teacher lectures both at the sai A 


La = 10+ -ВТдс, 
Lg = 5 + Твс» 


10 = Tac = Tec- 
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Figure 2-9. Production possibilities for two ways of organizing production. 


The last pair of equatios tells us that we have the properties of joint 
production. 

The production possibilities curve" is the single point (Lg = 15, 
La = 18) represented as Q in Figure 2-9. Notice that it lies **out- 
side" the production possibilities curve, so the “lecturing” kind of 
class organization is not dominated by tutoring organizations. Which 
technique will be picked depends on tastes. 

While these cases represent extremes—we might expect a teacher 
to use some lecturing and some tutoring—we can get a glimpse of the 
frustration that researchers face if they try to measure input pro- 
ductivities by looking at “time getting instruction” and relating it to 
outcomes on an individual basis. What we need to know are the times 
and outcomes, certainly, but we must realize that the results are 
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specific to a particular way of organizing production. If we mix up 
the data from lecutirng and tutoring classes, say, the results will tell 
us nothing about the underlying productivities, even if the data are 
on the level of the individual student. Indeed, in this case (as we have 
pointed out in Brown and Saks [1975b]), it is essential to aggregate 
one's data up to the level where all of the jointness is internalized. 
It is important to realize that in a world of joint production, disaggre- 
gated data are not always better. 


TOWARD MODELS OF TIME 
ALLOCATION IN CLASSROOMS 


Economists have used two kinds of models to describe the produc- 
tion technologies of firms. The first, and more common, might be 
called the assembly line. The characteristics of a pure assembly line 
mode are that (1) each unit of output can be described as having the 
same set of characteristics as every other unit, (2) each unit of input 
has the same characteristics as any other unit of that input and (3) in 
the production of output, every unit passes through the same stages 
of processing and in exactly the same order as other units. Pure pro- 
duction line technologies are rare, although some subspecies are easy 
to find—for example, the production ofa certain octane gasoline 
from crude oil of known and consistent chemical composition, ora 
particular model of automobile with given color, options, and so 
forth. | 5 

The second kind of technology model is the job shop. It is, in 
most important respects, the exact opposite of the m 
(See Reiter [1966] for a discussion of the features of jo үн a 
Each unit of output differs from every other; inputs may en 
vary in type or quality, and the process of transforming е c 
outputs differs for each unit of output. While again the pure xs is 
rare, an automobile repair shop or machine tool fabrication anc re- 
pair facility are suggestive examples. | 

Опе fas es of real world job shops is that they may er 
casionally process orders where many units require the I кы 
and transformation procedures. Because the set-up ші e apa 
ular process is usually not zero, the units requiring sim ar Es M 
may all be done together (the technical term is разл. Ся 
results in a kind of mini-production line run in о i as d 
job shop technology. It is easy to «e а particular real 
technology may have elements of both moge 5. | | 

Вазїс 5 та is the idea that the production ofa "i = 
а commodity can be thought of as being broken down into a se 
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tasks or operations on inputs. Thus the tune-up of an auto engine 
may entail raising the hood, removing and checking spark plugs, 
attaching wires for certain measuring devices, and so forth. Each 
operation requires the time of a person or a machine with special 
characteristics. Furthermore, some operations may have to be done 
in a particular order—for example, the hood must be raised before 
the spark plugs are removed. We shall assume that the tasks to be 
done in making a unit of output, though not necessarily their order 
or mode of completion, can be specified independently of whether 
the task is ultimately carried out in a job shop or on a production 
line. It then becomes an important economic question whether a 
good should be produced in an assembly line or a job shop mode of 
production. 

One further difference between the assembly line and the job shop 
is important. Because units of assembly line inputs and outputs are 
homogeneous, the measurement of input and output are relatively 
easy on the plant level. Indeed, this can be accomplished by measur- 
ing simple statistics on total or average production, since there are 
no relevant variations among individual output units. For the job 
shop, however, describing output can be difficult unless we are 
wiling to settle for a complete list of units produced, along with 
their characteristics. What is in the assembly line a simple question of 
more or less becomes a large data-handling or aggregation problem 
for the job shop, where the issue is often which goods to produce to- 
day and which not. A related aspect of job shops is that the com- 
plexity makes analytic solutions to the scheduling problem almost 
impossible. Rules of thumb are commonly used for allocation and 
management. 

We believe that the dichotomy between assembly line and job 
shop modes can yield useful insights into the functioning and evalua- 
tion of schools as production units. We have argued elsewhere 
(Brown and Saks, 19752) that the task of schools is to change the 
characteristics of particular students. In a typical school, or even in a 
classroom within a school, we find a wide diversity among the 
students. They may differ in their initial endowments of knowledge 
or skill or social traits. They may acquire traits at different rates, de- 
pending on the characteristics of classmates, teachers, or instructional 
materials or on how activities within the classroom are organized. 
The final desired set of characteristics that are the result of schooling 
may vary from student to student. Because the inputs are unique to 
students and because students are probably not homogeneous, the 
classroom would seem to be a good example of job shop production. 
It is the multiple output, joint production perspective of teaching 
that makes the analysis of job shops relevant to education. 
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. There is an important distinction between a school and a typical 
industrial job shop, however. The school could be (and some are) 
operated like production lines in the sense that all units are treated 
exactly the same way. While it would probably occur to no one to 
operate an auto repair shop like an assembly line, such is not the case 
with schools. In fact, it seems to us that many of the innovations in 
modern education are really technical changes of the kinds that or- 
ganize existing inputs differently, as opposed to using new hardware. 
The use of “tracking” or ability grouping may be a good example of 
trying to “stream” the processing of nearly homogeneous input units. 

In the next section we examine some production technologies for 
individual students. Following that, we explore the implications of 
grouping students in classrooms and the effects of allocating class- 
room resources in different ways. 


CHANGES IN STUDENT TRAITS 


shall assume that the change in a 
number of words that can be sight- 


read or spelled or the knowledge of the “sevens” row of the multi- 
Plication tables) is a function of three sets of variables: the student's 
learning traits acquired up to a time t*, some characteristics of the 
students with whom our student is grouped, and the productivity of 
various kinds of inputs. This learning gain function for a student is, 
at least for now, assumed to be deterministic and monotonic in the 
independent variables. Ordinarily, though not necessarily, the effects 


of increasing inputs will be positive. We have no strong prior judg- 
ments to make about the signs of the effects of the other variables. 
he effects of reallocating 


. Because we are interested primarily in t є 
inputs under various conditions, it will be helpful to think of the 
input productivities themselves as depending on the other factors. 

Now suppose we have a group of N students with different but 
known learning gain curves. There are assumed to be available at least 
a teacher and some other inputs such as books, chalk, chairs, desks, 
and maybe even a teaching machine. The problem we wish to in- 
vestigate, and the one that confronts every teacher, is how to allocate 
the limited resources among the students. This encompasses the 
narrower question of the order in which certain resources are used, 
and whether or not the students are better off in particular sub- 


groups, 


For purposes of illustration, we 
student’s learning trait (say, the 
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dynamic programming problem that depending on the particulars, 
may not be easy to solve. Such problems are, in principle at least, 
capable of solution by complete enumeration to any desired degree 
of accuracy. 

Because such problems are not readily solved in real world situa- 
tions, we are not concerned with the problem of finding optimal 
solutions or with the characteristics of such ideal outcomes. Rather, 
because such problems are tackled instead of applying rules of 
thumb, we shall concentrate on comparing the implications of 
different kinds of management strategies. This is the approach taken 
by Radner and Rothschild (1975) and would seem to be a particu- 
larly fruitful one for classroom management, since optimal control 
there seems to be a particularly distant prospect. Indeed, much of 
the history of classroom structure and organization can be described 
in terms of rules of thumb—the Lancasterian memorization lock- 
step methods of the nineteenth century, the somewhat less rigorous 
age-sex groupings of the early twentieth century and the age group- 
ings of the present day, ability grouping or tracking, self-posed in- 
struction, and even busing to correct racial imbalance. 

Our approach, then, is to take the underlying microstructure of 
learning curves for individual students and to apply rules of thumb 
for classroom organization and management. Very simple models 
using few students and uncomplicated learning curves will be ex- 
plored as an example. 


HYPOTHETICAL CASE 


Suppose there are two students with learning curves 


Gi = пр “вид (j= 1,2), 


where G;; is the score gain in a subject for student i when he used 
mode j for learning. The “time on task" is 5; and equal m — Т, 
where T is the set-up time necessary to begin operation in a mode. 
The Kronecker delta, 5;, will assume values of 0 or 1 depending on 
the mode used to organize instruction, while а; and 6;; are measures 
of the responsiveness of the gain score to additional time spent. We 
require а; — fi; > 0 for either value of ô. 

In our example, two modes will be used for illustrative purposes 
grouping (5; = 1) and tracking (5; = 0). Grouping means the students 
always work on the same task and in the same mode. Tracking 
means, in this case, that the two students are always using different 
modes. To assist our thinking about the problem, we might imagine 
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one mode to be "receiving instruction directly from the teacher" and 
the other to be “studying and doing problems іп a workbook.” 

We can illustrate how this sort of analysis might work by choosing 
some hypothetical values for the parameters of the learning curves. 
For example, we arbitrarily set: 


Gu = tu 85 – 1061 
Gi = ifs 4 – 1059 for student 1; 
Ga = ta 5 — 1061 

f dent 2. 
Goo = tz 25 – 1062 or studen 


Thus, for the first student the elasticity of gain score with respect 
to time receiving instruction is .85 (= .85 — .10[0]) if the teacher 
Works with the student alone, .75 (= .85 — .10) if the students are 
taught together. Instructional time, in this example, is more pro- 
ductive for both students than workbook or seatwork time. And 
When the students are grouped to do tasks, time is less effective. 
; First consider the score outcomes for the two students. Let the 
time the teacher has to devote to instruction be 240 minutes. If the 
tracking mode (j — 0) is adopted, each student may receive from 0 to 
40 minutes of direct instruction. The student not receiving instruc- 
tion works alone. The possible scores for the students are shown by 
the curve labeled "tracked" in Figure 2-10. If, on the other hand, 
the students are grouped so that they are always engaged in the same 
activity (j = 1), the possible scores are shown as the line labeled 
“grouped.” There is more than one “grouped” outcome because the 
teacher still has the option of varying his or her time with the stu- 
dents from 0 to 240 minutes. 

Note that under what many people would consider reasonable 
assumptions, most of the points in the grouped" set are inefficient. 
But which mode of organization and which time allocation within 
that mode will be chosen depends on the relative values attached 
to gains for the two students. We have suggested elsewhere (Brown 
and Saks, 1975a) that the distribution of scores can be conveniently 
Summarized in terms of its mean and standard deviation. In this 
Simple example, we can use the mean and range. The mean-range 
Combinations for the two modes of organization in this example 
are shown in Figure 2-11. | 

While the tracked mode might seem to be a superior method of 
Organization, this is not necessarily the case. For tastes that put a 
Breat value on reducing the differences between students (extreme 
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Figure 2-10. Scores of two students under tracking and grouping. 


"levelers"), the grouped mode might be chosen because it will give 
the lowest range of scores. Note also that within the tracking mode, 
a higher mean is always associated with a higher range, so that max- 
imizing the mean score also means maximizing the range. If in- 
creasing the range given the mean is perceived to be bad, some 
intermediate time allocation may be chosen. One might even choose 
a lower mean in the tracking mode than would be possible under 
some time allocations in the grouping mode because of the effect on 
the range. Thus, tracking need not result in greater variance or range 
than some grouping allocations. 

Let us summarize. The model of many outputs with joint pro- 
duction suggests that the job shop may be a fruitful analogy for 
understanding classroom organization. Because job shops, and indeed 
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igure 2-11. Mean and range of scores of two students under tracking and 


grouping. 


it is useful to examine the results of 
ll not generally be possible 
alue scheme or 


а are so complex, 
Ee ying different rules of thumb. It wi 
‚ decide among rules without applying some v 
utility function. 
,oHamischfeger and Wiley (1976) t 
Бе m organization as a problem in 
ала with the “abundantly obvious’ 
( e an individual spends trying to learn, th 
P. 6). With time as the focus of the analysis, they assert that we 
Ought to look at the nature of classroom activities, their content, 
their duration. The nature and content of activities is expected 
ug Vary from school to school and even from student to student. 
t uration is just another way of describing the frequency and in- 
ensity of the activities a student may undertake. 
th The task of the classroom teacher is the allocation of resources, 
€ principal one of which is time. Harnischfeger and Wiley contrast 
as approach, quite rightly, with an analysis of teaching effective- 
[igi that emphasizes subject matter content or teaching style. There 
Simply more to teaching than setting curricula and letting teachers 


ake up the question of class- 
the allocation of time. They 
' hypothesis that “the more 
e more he will learn” 
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teach it. Teachers can control, within rather wide bounds, how much 
time they will devote to particular pupils and how the pupils will be 
grouped together for certain specific activities. Cooperation and in- 
teraction among pupils and teachers play a central role in their 
analysis, as illustrated by their lengthy example (1976: 23-27), 
which we reconsider in detail below. 

We find Harnischfeger and Wiley's approach very much in the 
spirit of what we are suggesting. As we do in our model, they look 
at the individual student and how learning in a subject varies with 
time spent and with how the student is grouped with others in a 
class. They consider grouping of students and time spent on learning 
tasks as the main objects of teacher management strategy. As a 
model for future empirical studies, their work points up the necessity 
for finding the individual student learning curves under different 
assumptions about grouping students. 

However, although the Harnischfeger and Wiley model of the pro- 
duction of skills is similar to ours, we find their treatment and evalua- 
tion of the grouping strategies inadequate. Their example hypothesizes 
three classrooms that do not differ from each other in pupils’ char- 
acteristics. The students in each class are grouped into thirds (upper, 
middle, and lower) on the basis of prior achievement. Though they 
do not make it explicit, itis best to think of each third as being 
homogeneous in every way. The teachers for the three classes have 
“equivalent skills," an assumption designed to get rid of idiosyncratic 
characteristics that might make some teachers more or less pro- 
ductive with the same students and to emphasize the effects of dif- 
ferent management strategies. The analysis is based on a table of 
marginal productivities of spending extra hours of instruction on in- 
teger addition (the task). The marginal products vary by student 
achievement group and by whether the hour is spent in total class 
work, in a subgroup made up of those in the same their of past 
achievement, or in seatwork. Table 2-1 (their Table 3) shows these 
data. 

Studying the table, we see that the upper third is at least as pro- 
ductive in each instructional setting as any other group. Except for 
the middle third, seatwork and work as a subgroup are more pro- 
ductive than total classwork. 

Harnischfeger and Wiley's production framework includes the 
characteristics of jointness in production, the interrelations among 
productive units that we explored earlier. For example, an hour 
spent with the upper third as a subgroup increases achievement by 
2 units. When the other two-thirds of the cass are added, giving us 
the total class setting, the hour generates only 1 unit for the upper 
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Table 2-1. Hypothetical Margin Products (in achievement units) of an Hour 
Spent on Integer Addition. 


Achievement Instructional 
Group Setting 
Total 
Class Subgroup Seatwork 
Upper third 1.0 2.0 2.0 
Middle third 1.0 1.5 1.0 
Lower third 5 1.0 1.0 


Source: Harnischfeger and Wiley, 1976. Table 3. Used by permission. 


third. Total class instruction is, in fact, a good example of fixed pro- 
Portions jointness in the sense that in that setting, adding an hour of 
instruction to one group increases the input to every group by ex- 
actly the same amount. 

Another interesting feature of the 
can be produced with no teacher inpu 
fact, does best if they are simply ignore | 
do seatwork perpetually. Only the middle third shows any gain from 
subgroup (individualized) instruction over what could be gotten from 
seatwork. Jointness in production results in an hour of classwork 
being more productive than an hour spent with any subgroup. Thus, 
shifting an hour of subgroup instruction from the upper third, with 
a loss of 2 units, to total class instruction, with a gain of 2.5 units, 
would increase the total achievement of the class. 

Harnischfeger and Wiley then created three teachers, Buehler, 
Ewald, and Oates, each of whom employs a different time allocation 
or classroom management strategy. Each teacher and student has 
3 hours to spend. These hypothetical allocations are set out in Table 
2-2. Buehler has an affinity for total class and subgroup work, and 
Ewald shuns total class instruction entirely. Oates, like Buehler, 
Spends half the time with the total class and half on subgroup in- 
struction with the lower two-thirds. The effects of these allocations 
on achievement are given in Table 2-3. ; 

We find these нта interesting for several reasons. One thing that 

arnischfeger and Wiley do is draw our attention immediately to the 


average score gain in each classroom. But the strategies treat iue 
ent students differently, and since student's achievement gain = - 
ters to us in the end, we must be Wary of evaluating the нен на a 
strategy in terms of its effect on mean achievement. Teacher Ewald, 


example is that some learning 
t at all. The upper third, in 
d by the teacher and left to 
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Table 2-2. The Allocation of Time to Teacher Activities and Pupil Pursuits 


(in hours). 
Pupil Time 
Achievement Teacher Total Sub- Seatwork Seatwork 
Group Time Clas group (addition) (other) Total 
Teacher Buehler's grouping strategy 
Total class 1.5 
Upper third 0.5 1.5 0.5 0.0 1.0 3.0 
Middle third 0.5 1.5 0.5 0.5 0.5 3.0 
Lower third 0.5 1.5 0.5 1.0 0.0 3.0 
Total 3.0 
Teacher Ewald's grouping strategy 
Total class 0.0 
Upper third 0.5 0.0 0.5 1.0 1.5 3.0 
Middle third 1.0 0.0 1.0 1.5 0.5 3.0 
Lower third 1.5 0.0 1.5 1.5 0.0 3.0 
Total 3.0 
Teacher Oates's grouping strategy 
Total class 1.5 
Upper third 0.0 1.5 0.0 0.5 1.0 3.0 
Middle third 0.5 1.5 0.5 1.0 0.0 3.0 
Lower third 1.0 1.5 1.0 0.5 0.0 3.0 
Total 3.0 


Source: Harnischfeger and Wiley, 1976, Table 2. Used by permission. 


Table 2-3. The Effects of Grouping on Achievement (in achievement units). 


Achievement Group 


Upper Middle Lower Average 
Teacher Third Third Third Gain 
Buehler 2.5 2.75 2.25 2.5 
Ewald 3.0 3.0 3.0 3.0 
Oates 2.5 3.25 2.67 2.67 


Source: Harnischfeger and Wiley, 1976. Table 4. Used by permission. 
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with an average gain of 3 increases all the students' scores by exactly 
the same amount. Oates by contrast favors the middle third over the 
upper and lower thirds. Is Ewald's strategy superior to Oates's be- 
cause it results in a higher mean? Certainly not, unless we are pre- 
pared to neglect differential treatement of students as irrelevant to 
the evaluation process. If gains to the middle third are disproportion- 
ately valued, then Oates, whose strategy favors the middle, may be 
more efficient. 

It would appear the Buehler's strategy is inferior to those of Ewald 
and Oates under what many people would consider an acceptable 
value judgment. That is, a strategy is superior—more efficient—if it 
can be shown to raise some students’ scores without reducing any- 
One's. Ewald's strategy raises every grup's score by more than 
Buehler's; Oates's raises the middle and lower thirds and keeps even 
with Buehler for the upper third. We saw in our own example above 
how one could identify the set of superior strategies defined in this 
Way. Our discussion showed that the choices among superior 
Strategies can be made only with the aid of a utility function 
that can judge the relative values of achievement for different 
Students. 

Another interesting aspect of the stra 
Wiley present is that in terms of any cri 
arithmetic, all of them are inefficient. E 
the students some of the time to do s ‹ 
(Other).” This is presumably some kind of activity other than arith- 
metic that occupies student time but not teacher time. Clearly, if 
that time were simply transferred to arithmetic seatwork, the mean 
Score of the affected groups would be raised. The teachers must have 
their reasons for assigning "Seatwork (Other)" activities, but 
We are not given any information about what the payoff from 
this work is or how it might be evaluated against further gains In 
arithmetic. 

Finally, we were immediately tempted to ask whether there 
existed one or more strategies than the ones given that could roe 
mize even the average achievement gain. An allocation of three 
hours of seatwork to both the upper and lower thirds and of three 
hours of subgroup work for the middle third will given an average 
gain of 4.5 units. Thus, maximizing the average has the teacher ig- 
noring two-thirds of the class and putting all her or his ee 
into the single group where the relative (though not оше) ad- 
vantage is greatest. This result illustrates more clearly than any 
the bizarre outcomes we can get from trying to maximize the 


average. 


tegies that Harnischfeger and 
teria that value only learning 
ach assigns at least some of 
omething called “Seatwork 
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LEARNING CURVES AND THE 
FREQUENCY OF LESSONS 


If learning curves display diminishing returns to time spent and also 
require set-up times, we can work out an instructive example of how 
to analyze the question of optimal frequency and duration of les- 
sons. The example shows how these questions might be analyzed in a 
way that relates to variables that we could observe in the world. 

Consider that there is a measure of student competence by de- 
tailed subject matter and call this measure S for test score. Confining 
ourselves to production in school, consider the following linear pro- 
duction function for schools 


AS- DIDE aaa ува, (2.3) 


where the flow of added competency, AS, is due to standardized 
time, T, spent on different tasks related to learning the subject, and 
4 is the marginal score produced by another unit of T. Obviously, 
a is very sensitive to the design of the test, but if test design is con- 
trolled, a is a measure of the marginal product of T. Since we expect 
productivity to vary with the particular subject being taught, the 
techniques of instruction, the characteristics of the teachers, and the 
characteristics of the students, the as and Ts are indexed over at least 
four dimensions. 

One can develop some relations between the Ts that will exist in 
the optimal school, given valuation of scores for different students 
and subjects and given the @s or technical relations. But here we wish 
to focus on the issue of standardized time when we know that in 
designing lesson length there is a trade-off between set-up costs 
(getting ready for the new lesson) and fatigue associated with spend- 
ing too much time on one task. We want to partition the lesson 
length problem from the time allocation to subject matter problem. 
Time, T, is really a proxy for a unit of learning here. 

Perhaps the meaning will be clearer as we proceed with a particu- 
lar specification. Suppressing the subscripts, the amount of learning 
(or standardized time, T) will be the product of the frequency of the 


lesson, F, and the learning L, per lesson, which is itself a function of 
the duration, d, of the lesson: 


T- F-L(d). 


Assuming that there are diminishing learning returns to increasing 
duration, L(d) will look like the curve in Figure 2-12. Because 
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L(d) 


Figure 2-12. Learning as a function of the duration of a lesson. 


д1/а > 0, ƏL?/Ə2d < 0, there is a trade-off in producing a given 
level of T between higher frequency or higher duration (see Figure 
2-13). The MRTSp а (the marginal rate of technical substitution of 
F for d) is the ratio of the marginal T products of F and d: 


_ ща) _ ӘТЕР 
MRTSp,a = = др - aThd' 
Poa 


F 


Figure 2-13. The trade-off between frequency and duration of lessons for 


different levels of learning. 


98 Issues in Microanalysis 


This tells us about the technical psychological trade-off between fre- 
quency and duration. To reach an optimal decision, we need to know 
about costs of making alternative decisions. In the case of schools 
with inputs fixed in the short run, the relevant costs are time costs, 
Cr. These fall into two categories—set-up time for each lesson, Cs, 
and the time devoted to the lesson itself, d. We can write 


Cr - F-Cg + F- d. 


The marginal time cost of frequency of lessons is dC, [дЕ = Cg 
+ d, and the marginal time cost of duration of lessons is 3Cr /ðd = F. 
The ratio of these marginal costs (Cs + d)/F must be set equal to 
the МЕТ, а at the point where minimum time is spent on the pro- 
duction of standardized learning time. Figure 2-14 shows the selec- 
tion of optimal lesson lengths, d*. The ratio of marginal costs is a 
straight line with slope equal to 1/F and intercept equal to Cs/F. 


Because of the shape of L(d), the МЕТ5у а function starts at the 
origin and is convex from below. Select d* where 


L(d _ C,*d 
phL F^ 


MRTS (MC/MC) 


Figure 2-14. Choice of optimal lesson length. 
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Notice that the optimal lesson length is independent of the amount 
of T that is being produced. (This also assumes an interior solution 
and no other binding restraints.) 

This is, of course, a static equilibrium formation of the problem, 
and we can only suggest some of the dynamics. For example, if set- 
up time increases because of a decline in discipline in the school, the 
cost curve will shift up, and d* should increase. But it may not be 
possible in the short run to readjust the schedule of the school, par- 
ticularly if times get fixed by lesson plans in textbooks and by 
school customs. In that case, the analysis may have to incorporate 
adjustment costs. Similarly, if there is a desire to increase the empha- 
Sis (Ту to Т») on a particular subject (e.g., toward science because 
Russia sends up a Sputnik), there may be more problem in adjusting, 
Say, frequency rather than duration, and the school may be in tem- 
Porary disequilibrium with an adjustment path shown by the arrows 
in Figure 2-15. Alternatively, we could show the relation as in Fig- 
ure 2-16. Here, the change in the utility of the composite score can 
be shown as a function of the frequency of the optimal length les- 
sons in a particular subject. This curve is concave from below, 


F 


d 


ion of lessons over time. 


d' 


Figure 2-15. Adjustment of frequency and durat 
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Figure 2-16. Changes in the optimal frequency of lessons because of changes in 
taste. 


reflecting the diminishing marginal utility of more and more empha- 


sis on one subject to the exclusion of others. With a shift in tastes 


toward the given subject, the utility curve shifts to the northeast, 
and frequency should increase. But it may not be able to do so im- 
mediately. 


As a final note, we must realize that we are thinking of learning 

where there is always more to 

at all about situations where 

- In production terms, lessons may 
m? We do not know. 


INCENTIVE STRUCTURES AND STUDENT 
ALLOCATIONS OF EFFORT 


In the typical thinking about production of education in school, 
Students are treated as Passive inputs or raw materials that get 
processed by other factors of production. As we have emphasized, 
in the multiple output, joint Products case, it is the allocation of 
resources within the unit of production that becomes the important 
component of the analysis. In the last Section, we focused on the 
allocation. of teacher effort and continued to assume that students 
Were passive recipients of assi 


Ssignments—that they do what they are 
told and that even though learning curves ma у н tic 


y differ Systematically 
among students, the parameters of those Curves are given to the 
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teacher, who has to make decisions accordingly. If students respond 
in fixed ways to teaching techniques, if teachers know a lot about 
these responses, and if the classroom is authoritarian, this might be 
an adequate model of intraclassroom allocation. 

What we are saying is that the learning curves embody both 
“ability” and “motivation” or “effort” and that the teacher can pick 
the optimal points on the learning curve for each student and sub- 
ject. Yet in making such assumptions we would be ignoring the 
potential relevance of several important schools of thought. The 
recent literature on the optimal lifetime accumulation of human 
capital (see, for example, Heckman [1976]) concentrates on the 
choices of individuals who have to allocate their time among con- 
sumption, production, and investment in human capital. To the 
extent that children have freedom, they must make choices about how 
to allocate their fixed time to alternative activities. Unfortunately, one 
of the frequent assumptions of this literature is that adults act so as 
to maximize the present value of lifetime income, and it is hard to 
think of children being so calculating over such a long planning 
horizon. But it is clear that children do have to make the decision 
about how to spend their time within the constraints imposed on 
them. It is very likely that the field of labor economics, which deals 
with the behavior of labor inputs in the production process, has 
many suggestive analogies that should be considered in analyzing 
student behavior in schools. And it is the range of student choices 
that those most familiar with the institutional iterature on schools 
(cf. Thomas, 1977) have emphasized. In this section, we will suggest 
some of the important issues raised by student choice. 

When we discuss student allocation of effort, we need to focus on 
two dimensions—time spent on particular activities and the intensity 
of work performed in that time. In the production context, we 
would be talking about the number of hours working on the as- 
sembly line and the speed at which the line operates. In the learning 


curve context, we are talking about where the еу, aon 
the be "me is spent on the activity) and also 716 
curve (i.e., how much time 1 р P os #реп il relatively 


slope of the curve. While the measure © | | ~ 
Straightforward in theory, the measure of pace, intensity, or diffi- 
culty may be Меке hard to identify. The problem Los mp 
the meaning of learning curves and ability. A learning curve te i 
how much a student learns in any period of time under given p i- 
tions of production. That curve reflects both ability and effort. dm 
practical matter, ability would have to be defined as d 
of learning that was fixed from the point of view of the Р en nee 
effort would be that component of learning that the student c 
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adjust under the right incentives. It is exactly analogous to the dif- 
ference between the short and the long run in economic analysis of 
the firm. The “short run" is the economist's shorthand for optimiza- 
tion when some of the production variables are fixed and some are 
adjustable, and the “long run" describes optimization when all 
variables, including fixed plant and equipment, can be adjusted. The 
Shapes of learning curves may appear to be fixed for any technique 
of teaching, but there may be costly policies that would modify 
those curves but be quite beyond the ability of policymakers to 
change in the short run. These might range from nutrition of the 
fetus to parents' attitudes toward School. On the other hand, there 
may be costly policies (and remember that when economists use the 
word “costs” they do not mean money costs only) that could change 
the slope of the learning curve in the short run.? 

In standard production t 
have already explored all 
frontier, given their possibl 
talking about how to тап 


to get the “best” stude 
Student's problem i 
make the “best” 


manipulated by 
markets are the b 
things. Perhaps the m 
was Ronald Coase’ 
Coase emphasizes 


of using th i i 
and that the most important are the c Sean Hee mechanism 
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relevant prices are" and the “cost of negotiating and concluding a 
separate contract for each exchange transaction which takes place 
in a market . . . " (p. 336). We know of no studies that evaluate such 
costs, but the fact that classrooms using formal market mechanisms are 
rare might suggest that the costs are perceived to be considerable. But 
even where a formal market mechanism is not visible, the teacher-man- 
ager still has to deal with the problems that a market makes explicit. 

Consider the teacher's problem of manipulating the student. The 
teacher has two main control variables—the rewards that can be given 
to students and the pace at which the material to be learned is 
covered. These are shorthand for a host of things. Rewards would 
include grades, gold stars, enlisting the parents for home reinforce- 
ment of the teacher's incentives,'^ letting the student spend more 
time on desirable or fun activities, and the like. There is also the life- 
time component of rewards for performance in school. Better 
students get to go farther in school and have both higher income and 
higher status jobs. Since grades are a good predictor of the amount 
of schooling that will be taken,'' we prefer to ignore the question of 
the relative importance of lifetime and immediate rewards. It may 
be, however, that student variations in utility assigned to higher 
grades reflect variations in their valuations of the lifetime conse- 
quences of such grades. 

Similarly, the pace of work refers to a whole v 
istics of the learning situation. In some sense, pac 
the material covered are almost indistinguishable, since speed must 
always be relative to the level of difficulty. Thomas (1977) stresses 
the attractiveness and quality of complementary inputs such as 
“well-educated parents, competent teachers .. . ; well-written text- 
books, well-equipped science laboratories, and good libraries "Scy 
(p. 108). All of these are analogous in the labor economics literature 
to job characteristics (e.g. safety, boredom, physical discomfort, 
etc.) for which there will be compensating wage variations. 

We are ready now to develop a simple model of student choice of 
effort or pace. The two major elements of it are (1) the student S 
preference with respect to rewards and pace, and (2) the incentive 
System imposed by the teacher or school authorities. The two parts 
together will determine the level of student effort where the student 
has choice.!? 

Consider a student who has well-be 
to the pace of instruction and the rewar 
in school. We can, following usual conve 
tion for such a student and draw the con 
difference map): 


ariety of character- 
e and difficulty of 


haved preferences with respect 
ds that he or she is receiving 
ntions, write a utility func- 
tour of the function (an in- 
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UŁ = u! (R, P), and 


МЕбр,в = —dR/dP 
U* = constant, 


where R is rewards, P is pace, (7 refers to the utility function of the 
first student, and MRSp в is the marginal rate of substitution of 
pace for reward that leaves a student's level of utility unchanged. Fig- 
ure 2-17 displays some indifference curves (combinations of pace 
and reward among which the particular student is indifferent) for 
two different students. Higher curves represent higher levels of satis- 
faction (i.e., higher reward for a given pace). 

The two students we happened to pick had somewhat different 
attitudes toward pace. For the first student, higher pace is always a 
“bad” that needs to be offset with some reward. But for student 2, 


Figure 2-17. Indifference curves for two students. 
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too slow a pace is also undesirable (perhaps because of boredom or a 
feeling that valuable time is being wasted) so that the marginal rate 
of substitution of pace for rewards actually changes sign. It will be 
important later to realize that a student with higher ability is more 
likely to have the preferences of student 2 on the grounds that higher 
ability means that a given pace would not be so arduous and too slow 
a pace may induce boredom. Let us ignore this for the moment. 

Now that we have a map of students’ preferences, we need to 
know about the incentive system that, given their preferences, will 
determine their pace. We need two relations to generate the reward 
structure for pace—a relation describing how rewards depend on 
learning (output) and pace and a relation (production function) 
describing how learning depends upon pace, ability, and time spent 
on the activity: 


R= r(Q, P) (the general reward structure), 
Q — q(P, A, t) (the production function), 
Q is measured output or learning, A 


the activity. The reward structure for 
consistent with these 


Where R is reward, P is pace, 
is ability, and t is time spent on 
pace will be those values of R and P that are 


two equations 
ow what these functions look like. 


To proceed, we need to kn 
There is some “experimental” evidence on the shape of the reward 


Structure. Figure 2-18 shows the apparently typical results of 
asking adults playing teacher to assign rewards to various students 
who display varying amounts of effort, ability, and achievement on 
tests measuring learning. It is clear that rewards are given not only 
for performance but also independently for higher effort. There is 
also an ability component to the reward structure, although it takes 
the form of penalizing *underachievers," and we regard it as an 
additional correction for inadequate effort. A reasonable question is 
why effort should be separately rewarded from output, which re- 
quires both effort and ability. We can think of several reasons. First, 
by paying extra rewards for effort, the incentive system is placing a 
premium on the cause of learning that is, by definition, most under 
the student's power to control. Second, as Bowles and Gintis (1976) 
and others have pointed out, schools do more than teach cognitive 
Skills, and hard work is one of the behaviors that they try to in- 
culcate so that students may become productive members of society. 
Third, the harder a student is working, the easier it is for the teacher 
to pursue the goals for the class. In terms of the input exhaustion 
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о———а Ability and Motivation (AM) eS 
0— — —-о Ability and no Motivation (A-M) ~ 
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Source: Weiner and Kukla, 1970. Copyright 1970 by the American Psychological 
Association. Reprinted by permission. 


Figure 2-18. Evaluation (reward and punishment) as a function of pupil ability, 
motivation, and examination outcome. 


for joint production, high effort and the order implied by such be- 
havior should make a jointly applied input go further. Let us defer 
the more difficult question of just what the shape of the reward 
structure ought to be. 

The shape of the production relation (or learning curve) has al- 
ready been partially discussed. We might assume that the curve dis- 
plays diminishing returns to increased pace, and indeed it is not hard 
to imagine cases of negative marginal returns where higher pace be- 
gins actually to confuse the learner. 

Figure 2-19 puts the two relationships together. All variables are 
measured in positive units from the origin of the graph. The south- 
east quadrant shows the production relation for a given lvel of 
ability and given time spent on the activity. The northwest quadrant 
is a stylized version of Figure 2-18 where R is approximately linear 
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Q-q(P; A,t) 


Q 


Figure 2-19. Generation of reward structure for higher pace. 


in Q and P. The northeast quadrant 


pe ere Bp f R and P that are consistent with 


Shows the locus of combinations О tan 
the given equations. One finds such points by picking a P (e.g., Ру), 


finding the level of Q that will be produced, and then eee 
the level of R that corresponds to that P and Q. We can vi d 
in the marginal product of pace will reduce the slope о: и са 
for pace curve, although that could be offset by ен iene ibe 
or intercepts of the general reward structure. By mea un 
Curves in the northwest quadrant, We can trace xe go y i 
different reward for pace curves. Indeed, if we made Е 4e 
derivative of reward with respect to pace sufficiently € nd 
by punishing students that get too far ahead of the others), 

draw a downward sloping reward for pace curve. 
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Figure 2-20. Selection of pace. 


The two parts of the decision problem are joined together in 
Figure 2-20. Three different reward structures are displayed there— 
R^, R?, and R3. Given the reward structure that is attainable, the 
student tries to get to the highest possible indifference curve. That 
tangency will determine reward and pace. These points are shown 
as triangles for student 1 and as small circles for student 2. We can 
see that by manipulating the reward structures, we can obtain a 
variety of different distributions of learning in the classroom. Fur- 
ther, if we were to insist on everyone proceeding at the same pace, 
we can do so with heterogeneous student tastes and homogeneous 
abilities only if we have reward structures that correspond to each 
different type of student. Thus, applying R3 to student 1 and R? 
to student 2 will cause both to proceed at pace P! .!5 In the example 
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drawn, it should also be noted that each would be much happier with 
the other's incentive scheme. What are the costs of maintaining such 
a differentiated reward structure within the same classroom? What 
are the costs of having everyone proceeding at their own pace? These 
are the kinds of questions that this sort of analysis suggests. 

How should the incentive structure, and the distribution of en- 
suing efforts, be designed? In the case where the teacher or other 
School authorities have well-defined preferences over alternative 
distributions of school outcomes (Q1, 02, · - -> Qn)» they would pick 
that feasible incentive scheme that maximized their utility function. 
The reward structure would be associated with marginal valuation of 
outputs just as wages are related to marginal value product in the 
neoclassical economic model of the labor market. 

There has been some recent interest shown by economists in 
questions of the design of incentive systems, and some of these are of 
potential interest to the problems we have been discussing here. For 
example, is it better to reward on the basis of measured inputs of 
time, effort, and ability, or is it better to reward performance? In the 
labor market context, this is the difference between wage rates for 
hours worked and piece rate payments for output. The answer de- 
pends on the relative costs of measuring the various inputs, the pro- 
duction costs of making a measurement mistake, and the conse- 
quences of different systems of compensation for the supply of 
inputs. For example, there is measurement error in all of the variables 
in the reward structure equation presented above. We have observed 
grade inflation! in recent years. Is it because m more open and 
decentralized classrooms, there is an increase in the measurement 
error for effort, time allocation, and ability variables, and the teach- 
er's loss function is asymmetrical such that making eoin by 
giving too low a grade incurs relatively high loses? Orisit ae wi 
in decentralized open classes the premium for input allocation as 
had to rise in order to induce students to do voluntarily what they 
used to do under force, so that the grade associated with any per- 
formance ability level is higher? What are the consequences of such 
phenomena for the structure and stability of the school Hato 
System and the portions of the economy that rely on such rd 
How will changes in the relative weights of performance an input 
affect morale in the classroom? These are all questions that are o 
interest to economists as well as to sociologists because production 
and allocation will be affected. , Р 

The issue of measurement error and informational тане 15 
especially important in discussing incentive systems. We «УШШ, 
posed that students were approximately homogeneous in abliiuies, 


110 Issues in Microanalysis 


but this is not very likely in any randomly drawn group of students. 
This may be symptomatic of a more general uncertainty about the 
precise shape of the learning curve. The teacher in such a situation 
has two general strategies that might help. First, the teacher might 
use some indicator variable!? that is correlated (the teacher believes) 
with the variable in question, in this case ability and marginal pro- 
duct of time and pace. Indicator variables might be diagnostic results 
(cf. Rosenthal and Jacobson, 1968) or social or class status indica- 
tors. They may not be very good, but we are assuming that better 
indicators are not worth the expense of acquisition. Indeed, there is 
an implicit market for information, and one could work out the 
appropriate rules for determining whether better diagnostics are 
worth the extra cost. Under the right conditions, including some 
restrictions on the teacher's utility function, the teacher would 
allocate time and other resources to the students with the higher ex- 
pected abilities (because of their higher marginal products) and make 
the beliefs self-fulfilling because students who received more inputs 
would indeed show more learning. 

The second teacher strategy would be to let the equations por- 
trayed in, say, Figure 2-20 operate to sort the students by ability. 
One help to successful sorting would be a correlation between the 
tastes of high ability students and their preferences. We would 
expect student 2 to have higher ability than student 1. This means 
that student 2's learning curve is higher. The consequences of this are 
shown in Figure 2-21, where R? is the reward for pace curve that 
would be generated for student 2 and R! is the reward for pace 
curve for student 1. Both the differences in generated rewards and 
the differences in pace would help to sort the students by ability. 
That differences in pace reflect ability is the notion behind what 
Dahlloff (1971) emphasizes in the analysis of steering groups or fram 
frame theory. Some educational researchers believe that teachers 
often set their pace so that the tenth to twenty-fifth percentile in 
the class is just keeping up. Why teachers look to this "steering 
group" in determining pace in unclear. On sampling grounds alone 
it makes sense to watch just a few children's progress with regu- 
larity. But the important point is that the "steering group” is fixed, 
and it seems as if the teacher has used desired pace as an index of 
basic ability in the way our model predicts. Why the pace is set too 
fast for the lowest portion of the class is an interesting question. 
Akerlof (1976) has a model of the Speed of the assembly line that 
predicts it will be set so everyone is working faster than they would 
like because rewards are based on the average product of the line and 
by turning up the speed the slowest workers are weeded out and 
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Figure 2-21. Reward for pace when abilities vary- 


rewards do not have to be shared with that group. Is this what нар. 
Pens in classrooms where ability groups get tracked in what е S 
d à self-selection process? It is an interesting question to which we 
О not have | 
In this mius of Mos chapter, we have sketched just the bns 
Cutline of what we would take to be the elements ofa aS ~ p 
Sis of the student supply of effort and its role in the и uc Е 
earning in schools. This has been the domam of psych o ped га 
SOciologists, but in thinking about these matters, we ave e ita 
elieve that perhaps economists also have something to con Й 
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CONCLUSIONS 


In our critical analysis of the literature on production functions for 
schools, we have become more convinced than ever that the models 
most used are also most useless. That is a harsh judgment on a boom- 
ing industry, but the problem has been that tools always have to be 
modified to fit the institution studied, and that is only now be- 
ginning to happen in the new field we call the economics of school- 
ing. If multiple outputs and joint production are not a minor feature 
of our system of education, the important questions are going to re- 
volve around determining values for multiple outputs and analyzing 
the allocations within what have been the typical units of analysis. 
If the production function literature and its apparent failure to 
generate sensible results causes this redirection and reanalysis of such 
an important sector of our society and economy, one could hardly 
write the work off as unimportant. Our hope is that in developing 
tools to deal with the problems of schooling, economists will also 
refresh their own discipline. Indeed, the journals are increasingly 
filled with articles that deal with questions of optimal incentives, 
joint production, and other features that are so essential to schooling 
institutions and to many other institutions in our society as well. 

We will close this chapter with a hint of our own research agenda 
over the next couple of years. This will reassure our readers that our 
criticism has not bred despair, though foolishness may be a word that 
comes to some minds. Time is the most important scarce commodity 
that gets allocated in schools. It is clear to us that as classrooms be- 
come more open and decentralized, the teacher needs to be a good 
manager as well as an expositor. We have witnessed classrooms where 
the decentralization was so extensive that we could not help standing 
in awe at teachers who could manage such a situation effectively. 
This suggests the following steps: (1) collect or use other researchers' 
data on time allocations in classrooms to test some of our theories 
and to calibrate models of classroom organization and allocation, 
(2) use such models to develop and evaluate various rules of thumb 
about teaching and also perhaps to help in the training of teachers, 
and (3) use such models as a framework for evaluating new instruc- 
tional techniques. Our own view (Brown and Saks, 1978) is that such 
work can be done effectively only in collaboration with experts from 
other disciplines and with people who understand what goes on in 
classrooms. Collecting data and working with practitioners are not 
things that economists have had much experience doing. We are en- 
couraged, however, by the fact that people as diverse and thoughtful 
as Barr and Dreeben (sociology), Thomas (education), and Walberg 
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n] are all converging with us on similar perspectives and, 
be ueni cer ant, on agreement about the key questions that need to 
оо Е . As we proceed, however, maybe we should be looking 
ис Г. ora another dimension that we seem to be leaving 
hed - Perhaps this is a good point to remember Weber's plea from 

osing passages of The Protest Ethic and the Spirt of Capitalism: 


. is now bound to the technical and 
hich today determine the 
s mechanism, not only 
with irresistible 


PME the modern economic order . . 
eee aes conditions of machine production w 
às 50 all the individuals who are bom into thi 
E ose directly concerned with economic acquisition, 
Orce.... 
E dapes who will live in this cage in the future, or whether at the 
Eh ү tremendous development entirely new prophets will arise, or 
EIS be a great rebirth of old ideas and ideals, or, if neither, mech- 
Yu of те embellished with а sort of convulsive self-importance. 
“Social e last stage of this cultural development, it might truly be said: 
is p cialists without spirit, sensualists without heart; this nullity imagines 

at it has attained a level of civilization never before achieved." (1958: 


181-82) 
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from the classroom perspective is in 
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5. Take an example of the application of the above theory to a school 


resource allocation problem. Thomas (1977:67-69) analyzes the case of two 
functions involving time differ. 


Students for whom the production (learning) 

le equates higher total productivity with greater efficiency (1977:2). 

m us he says, “Students with higher efficiency (or time value) can produce 
Ore learning in a given period of ti » We are on the verge of a con- 
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on to add that “production theory suggests that in their [students with high 
efficiency] cases, a given amount of students' time will be combined with a 
greater than average amount of purchased resources" (p. 67). Thomas cites the 
proverb, “То them who hath shall be given” (p. 68). Frankly, we do not under- 
stand the argument, and its conclusions seem inconsistent with the results of 
economic theory. Efficiency requires that resources be allocated where the 
additional (i.e., marginal) payoff is greatest, not where total payoff is already 
highest. 

In the context of classical production theory, how should time and other 
resources be allocated to two students when one of them always achieves at a 
higher level than the other for any input combination? Suppose the students 
each have the same amount of their time devoted to a learning process and 
that the question is, How should purchased inputs be allocated between them. 
The answer is that the inputs should be allocated so as to get on the production 
possibilities curve, rather than be stuck inside it. Further, we want to choose 
the right point on the possibilities curve, based on the values to the teacher 
of the outcomes of the students. This argument does not favor the high achiev- 
ing (“efficient”) student or, in fact, the low achiever. Indeed, the optimal 
allocation is determined as much by tastes as by productivity. Thus Thomas's 
conclusion that “if we make comparisons among school districts or within 
classrooms, we will expect to find that those students who have already re- 
ceived large investments at home and in school for the development of their 
cognitive skills will be likely to be the recipients of larger educational expendi- 
tures, better quality teachers, and superior books and equipment than those 
Students who have not benefited from substantial home and school invest- 
ment” (1977:68) is unwarranted. The rich may indeed get richer, but an eco- 
nomic rationale would depend either on their having higher marginal (not total) 
outputs or on their being favored individuals in the scale of social values. The 
teachers have to have “elitist” preference functions in our (Brown and Saks, 
1975a) sense. 

6. Notice that under the usual assumptions about utility functions not 
every point on the curve is superior to each point inside it. What is generally 
the case is that for any point inside the curve, say P, there is some point on the 
curve that is better. 


7. The only possible exception appears to be some measure of time, usually 
student days attended. 
| 8. For some reason, when Thomas (1977) addresses this problem, he wor- 
ries not about the marginal product of time (the slope of the learning curve) but 
about the average product of time. He defines “ће quality of a student's time 25 
the ratio of learning produced (in a given subject) to time expended...” and re- 
marks that it is equivalent to student “efficiency” (1977:58). In terms of classi- 
cal production theory, it is marginal learning per unit of time spent that is im- 
portant. Presumably, students allocate their efforts or have them allocated 50 
that marginal value learning per unit of additional allocated time is the same 
for all activities. 
9. Some writers, including even Thomas (1977:58), try to separate nature 
(genes) from nurture (past human capital investment) in accounting for the 
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5 pd of the learning curve. This may be an interesting question for 
pud s, but it has nothing to do with economics. For the economist, the 
pend ers are for whatever reason fixed with respect to some policies and 
Pen ic respect to others. After all, genetically caused problems are not 
ier y best dealt with by genetic policies. Myopia is efficiently handled by 
vn ки ~ eyeglasses. Knowing that the deficiency is caused by unusual genes 
е e of interest in evaluating a policy of providing appropriate corrective 
ce is analogous to the effect of unearned income on labor supply in the 
Sei consumer model. If the child gets a high level of reward at home ir- 
diff, De of school performance, the teacher's incentive structure will have a 
d impact on effort than if parents reinforce that structure. 
ud 1. vi Wallach (1976) for a discussion of the relation between test scores 
stad n es and success. He finds a weak relation. 
ing and success, though, there is ample corre 
good review, see Welch (1974). 
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nal differentiation and student ability. We bel 


different way (see below). 
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олан to extend the analysis to student determination of time spent in 
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nd the student obeys. In decentralized classroom or where homework is im- 
Portant, this is an important omission. 
14. These results are rported in Weiner (1976) and are from a study by Winer 
and Kukla (1970). 
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ee ifferent instructional methodologies ! 
als n is in terms of student time allocation, 
aso applies to effort. It is ап empirical questio 
technological differentiation" outweight the costs. 
16. For some evidence on this, see Bills (1977). 
5 17. Readers who wonder about this might try the following experiment. 
n time a student comes to complain about a grade, begin by asking whether 
€ student felt the grade was too high or too low. 
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Ж Chapter 3 


The Role of Levels of Analysis 
in the Specification 
of Education Effects* 


Leigh Burstein 
University of California 
Los Angeles 


One of the chief functions of educational research is to 


specify school- and home-based factors that influence 
and to ascertain their effects. 


Tol effects (e.g., Alwin, 1976; Coleman e 
975, Peaker, 1975; Sorenson and Hallinan, 
€ economics literature on educational production functions (e.8., 
Averch et al, 1972; Brown and Saks, 1975; Hanushek, 1972; Mur- 
nane, 1975), the educational psychology literature on teacher- 
Classroom effectiveness (e.g. Berliner et al, 1976; Brophy, Biddle, 
and Good, 1975; McDonald and Elias, 1976), and the literature on 
evaluation of educational interventions (¢.8-, Circirelli et al., 

69; Cline et a, 1974; Smith and Bissell, 1970; Stebbins et al., 


1977) 

с Invariably, studies in each of these areas encounter complications 

ates by the nature of the educational enterprise. A central element 
these complications is that educational data are inherently multi- 


“vel. That is, education involves students taught by teachers in class- 
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rooms in scholls in school districts and so on. As a consequence, 
one would expect that any attempt to identify the factors that 
affect educational performance would involve analyses of multi- 
level educational data. 


OBJECTIVES AND ORGANIZATION 


The purpose of this chapter is to present a comprehensive treatment 
of the role of levels of analysis in the specification of educational 
effects. The issues to be considered include: 


1. The conditions under which unbiased estimates of microlevel 
(individual level) processes can be derived from macrolevel (ag- 
gregate or group level) data (problems in cross-level inference), 

2. The identification and interpretation of the effects of macrolevel 
variables on microlevel processes (group-structural-contextual- 
composition effects); 

3. The basis for choosing an appropriate unit of analysis in a given 
context (appropriate unit of analysis), and 

4. The specification of an appropriate analytical model in the es- 
timation of relationships from multilevel data (specification of 
analytical model). 


Like multilevel educational data, the four issues are hierarchically 
nested in many respects. The last issue—specification of an analytical 
model—subsumes the rest. The results from work on each issue will 
be discussed in turn, and problems that warrant further attention 
will be identified. | 

The emphasis throughout will be on large-scale regression-based 
analyses of multilevel data from surveys, quasi-experiments, and field 
studies. I will not attempt to address the large literature on multi- 
level data in experimental research or research with strictly ordinal 
or categorical data except as these areas shed light on issues in other 
types of research. 


PROBLEMS IN CROSS-LEVEL INFERENCE 


The difficulties involved in making inferences across levels (units) of 
analysis have been extensively investigated by social scientists (e.8-: 
Hannan, 1971). Work on change in units of analysis (Blalock, 1964), 
ecological inference (Alker, 1969; Duncan and Davis, 1953; Good- 
man, 1953, 1959; Menzel, 1950; Robinson, 1950; Scheuch, 1966, 
1969), aggregation bias (Burstein, 1974, 1975a, 1975b; Feige and 
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Watts, 1972; Green, 1964; Grunfield and Griliches, 1960; Theil, 
1954), correlations based on grouped data (Gehlke and Biehl, 1934; 
Pearson, 1896; Thorndike, 1939; Walker, 1928; Yule and Kendall, 
1937), and grouping of observations (Burstein, 1978c; Cramer, 1964; 
Haitovsky, 1966, 1973; Firebaugh, 1978; Hannan and Burstein, 
1974; Johnston, 1972; Prais and Aitchison, 1954) all stem from 
concern about making inferences about relationships at one level 
from relationships found in data at a different level. 

Since research on educational effects involves mu 
(pupils, teachers, classrooms, schools, etc.), problems of cross-level 
inference invariably arise. The problem of concern here is how to 
answer the question, What are the conditions under which unbiased 
estimates of parameters of microlevel processes can be derived from 
macrolevel data? 

We know a great deal about the consequences of comparisons of 
mue at two distinct levels of aggregation. Under contract from 
he National Institute of Education (NIE), the Consortium on 
Methodology for Aggregating Data in Educational Research re- 
Viewed, classified, interpreted, and expanded the work on change 
In units of analysis problems previously associated with such sociolo- 
gists as Blalock (1964) and Robinson (1950) and with such econo- 
Mists as Cramer (1964), Prais and Aitchison (1954), and Feige and 
Watts (1972). Most of the findings from the work of the Consortium 
(elgl, Burstein, 1975a, 1975b, 1978c; Burstein and Knapp, 1975; 
Burstein and Linn, 1976, Hannan, 1976; Hannan, Freeman, and 

eyer, 1976; Hannan and Young, 1976a; Hannan, Nielsen, and 

Oung, 1975) are directed specifically to the estimation of para- 
к from structural regression models. However, the literature on 
he effects of cross-level inference in experiments (Glendening, 1976) 
197 On categorical data (Goodman, 1959; Shively, 1969; Iversen 

73; Maw, 1976) reflects similar trends. ы 

be main findings about the differences between regression 


Models at two levels can be summarized as follows: 


1 


Itilevel data 


: Consistency of Estimation—Estimates of regression coefficients 
from different levels of analysis are inconsistent (asymptotically 

lased) unless groups are formed randomly, on the basis of the 
Values of the regressors (grouping on the independent variable) or 
9n the disturbances (assuming that the disturbances are uncor- 


related wit! 
2. Det with the regressors). " f the differences 
ermi i —The magnitude of the d i 
nants of Consistency group level data is a 


etween coeffici individual and 
cients from inclvi А ; 
function of the relationship of the grouping variable(s) (rule, 
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methods) to the regressors, to the dependent variable net of the 
regressors, and to the ratios of the variances of the regressors 
at the two levels of analysis. Grouping directly on the outcome 
variable yields particularly poor results. 


. Specification Bias and Inconsistency—Differences in coefficients 


across levels of inference are a clear indication of specification bias 
(the deletion of causally relevant regressors correlated with other 
regressors in the model) except in the special case when groups are 
formed directly on the basis of values of the dependent variable. 


. Efficiency of Estimation—Even when consistent, estimates of coef- 


ficients from one level of analysis are inefficient for estimating 
coefficients from another level unless observations are grouped 
according to values of the regressors. 


. Generality of Principles—The above principles appear to be ap- 


plicable to both simple and multiple regression models, to non- 
recursive models, and to longitudinal models. 


. Variable Efficiency in Multiple Regression—In multiple regression 


models, estimates of coefficients are more efficient for the regres- 
sors that determine group membership than for the other re- 
gressors in the model. 


.Effects of Collinearity—Collinearity among regressor seriously 


affects the consistency and efficiency of estimation across levels. 


.Aggregation Gain—Aggregation gain is possible in at least two 


special cases: when grouping minimizes grouped variation in con- 
founding variables and when regressors at the lower level of ag- 
gregation are measured with error. 


. Preconditions for Assessing Group Effects—Knowledge of the 


process that groups observations and the nature of the substantive 
problem and research design is crucial to the determination of the 
consequences of grouping. 


Theoretical Results for Different Types of Grouping 
The reader who is willing to accept the above summary or who is 


concerned only with the practical effects of cross-level inference for 
research on the effects of education is encouraged to skip to the 
“Empirical Illustrations" section below. For others, I provide further 
explanation of the summarized findings by considering the effects 
of misspecification and grouping on a two regressor linear model: 


Y-2oc0X, + BoXo +u, (3.1) 


where the disturbance, u, has a mean zero, constant variance o? , and 
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is asymptotically uncorrelated with the re 
| gressors Ху and X2. Th 
least-square estimates, bı and 2; from : ^^ 


ў=а+Ь,Х, + Хә (3.2) 


are consistent estimators of 8; and f» in equation (3.1). 
M. P investigator fails to include Хз and instead estimates а 
e 


у= о *B1Xy +, (3.8) 
con- 


following Theil (1971), the least-squares estimator, фу, of f 
verges asymptotically to 


plim(b4) = ва + B2 521, (3.4) 
ample from the regression of X2 
rrelated, least squares applied to 
n equation (3.1). 


Par by is the coefficient in the s 

1- As long as X; and X; are co 

= orum (3.3) yields inconsistent estimates off, i 
e magnitude of this discrepancy, 


plim(b;) — Ву = 82521, (3.5) 


is called the specification bias of b; as an estimator offi. 
gi When observations on individuals are grouped into m groups (for 
i tet we assume equal size groups of size n), the group level 
alogues to equations (3.1) and (3.3) can be written as 
Y= а + ByX1 + b2X2 +0 (3.6) 
i Ү-а+ % +. (3.7) 
Least " е 4 табох Bf 
such ue applied to equation (3.7) yields an estima! or bi 
plim(61) = 1 + 621. (3.8) 
when X; and X, are correlated, 


As with the individual level models, 
with discrepancy 


1 is an inconsistent estimator of 61 


plim(b1) – в = 62821. (3.9) 
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By comparing the least-squares estimators from various combina- 
tions of the two individual level models (equations [3.1] and [3.3]) 
and the two group level models (equations [3.6] and [3.7], we can 
ilustrate all the results on bias (inconsistency) in estimation from 
grouped observations.! Three distinct sets of circumstances warrant 
discussion: (1) group level regressors and disturbances uncorrelated, 
(2) group level regressors and disturbances correlated, and (3) group- 
ing with a misspecified model. 


Case A: Group Level Regressors and Disturbances Uncorrelated. 
If the assignment of individual to groups does not create a correla- 
tion between the regressor(s) and the disturbance—that is, between 
X, X5, and и in equation (3.6) or between b and w in equation 
(3.7)—then the parameters from the group level and individual 
level models will be the same. The implications of the above for 
estimation can be stated as follows: 


If X, and % are uncorrelated with п, and Ху and W 
are uncorrelated, then 


В. = 61,82 = 82,81 = 81, (3.10) 
and correspondingly, 
b, = bi, be = 52,61 = b. 


The kinds of grouping practices that ensure that the group level 
regressors are uncorrelated with group level disturbances include 
random grouping, grouping by the regressors (X, or X, in equation 
[3.1] or X; in equation [3.3] if X, and X, are uncorrelated) or 
grouping by the disturbances (u in equation [3.1], w in equation 
[3.3]. Several investigators have demonstrated the above by 8 
variety of different analytical approaches (Blalock, 1964; Burstein, 
1974, 1975b, 1978c; Cramer, 1964; Feige and Watts, 1972; Fire- 
baugh, 1978; Hannan, 1971; Hannan and Burstein, 1974). 

Note that under random grouping and grouping by the disturb- 
ances, there may be no between group variation in the regressors 
(Cronbach, 1976). If the latter occurs, the denominators of bi. 
Б», and 61 go to zero, and the corresponding estimates cannot be 
determined. In practice, pure cases of random grouping and es- 
pecially grouping by the disturbance occur only rarely. 

The results on efficiency of least-squares estimators in equations 
(3.6) and (3.7) are also well known (Burstein, 1975b; Cramer, 1964; 
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Feige and Watts, 1972; Hannan and Burstein, 1974; Prais and Ait- 
chison, 1954). Random grouping reduces both systematic (associated 
With X, and X, in equation [3.1]) and error variation (и in equation 
[3.1], w in equation [3.3]). Thus, random grouping is considerably 
more damaging to the efficiency of least squares than is grouping 
by X. In finite populations (of N observations) with m equal size 
groups, the efficiency of random grouping is approximately m-1/N-1c. 

In terms of efficiency, grouping by the disturbances only is worse 
than random grouping. Since u and X are uncorrelated, variation in X 
ls decreased as in random grouping. But, in addition, variation in 
И is maximized; o2 = о?. Both of these effects reduce efficiency. 

Grouping by the regressors maximizes the variation in X and 
thereby minimizes the information loss through grouping (Cramer, 
1964; Prais and Aitchison, 1954). In fact, grouping by the regressors 
15 optimal in the sense that no other grouping method can yield 
8toup level estimators with smaller variances. 

When data are grouped by a subset of the regressors (e.g., by Ху in 
equation [3.1], the discussion of efficiency of grouping by the 
Tegressor requires further elaboration. While group level estimators 

Parameters remain consistent when grouping is based on a 
Subset of the regressors (Burstein, 1965b; Haitovsky, 1966, 1975; 
Hannan and Young, 1976a), the estimators for the regressors deter- 
Mining group membership are much more efficient than the remain- 
ing estimators, That is, if observations are grouped by X; in equation 

1), then the efficiency of 5, as an estimator of йу is much su- 
Perior to the efficiency of b» as an estimator of f». 

he effect of grouping by a subset of regressors can be seen in 

© data from the Houthakker and Haldi study presented in Hait- 
ovsky (1966, 1973). The data are from the regression of automobile 
Purchases, Y, on income, Ху, and initial automobile inventory, X5. 
а data are grouped by Ху, SE(b5) is larger than SE(b; ), even 
h ugh for the ungrouped data SE(b2) < SE(b;). Similarly, when 
© data are grouped by inventory, X5, only SE(bi) > SE(b2). 
by апап and Young (1976a) found that when data were grouped 
gro 1, the efficiency of bọ was only slightly better than for random 
op “ping. They attribute the inefficiency of the estimator to the 
ins influences of the correlation of Ху and X» (212) Оп тз 
effi „Оп var (X,). Hannan and Young (19762) define the relative 

“lency of b, to be 


џат(бу) __ 1— таг eo 


eff(bs,b2) = рата) 1=Т%% var(X») 
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As рз increases, the first ratio on the right-hand side decreases while 
the second ratio increases. Hannan and Young (19762) conclude that 
the two tendencies offset each other, so that the efficiency of 
b is no better than random grouping. 

Before leaving this case, it is useful to note one instance in which 
grouping by the regressors actually improves estimation. Under 
classical assumptions about errors in variables, when regressors are 
fallibly measured (i.e., the Xs are measured with error), least-squares 
estimation from the ungrouped fallible observations yields atten- 
uated estimates of the true parameters (ry x'8 rather than 8, where 
rxx, is the reliability of X). It has been shown (Bartlett, 1949; 
Madansky, 1959; Wald, 1940) that if groups are formed on the true 
values of the regressors, the group level estimator, b, of В has smaller 
bias than the ungrouped estimator, b, based on fallible measures. 

Several investigators (Aigner and Goldfeld, 1974; Blalock, Wells, 
and Carter, 1970; Hannan, 1976) have provided evidence that 
grouping on the fallible regressor (rather than on the true regressor) 
can also yield estimates with smaller bias than the ungrouped es- 
timator. If it is true that grouping on the fallible regressor yields 
better estimators, it may be practially possible to have an aggregation 
gain (Grunfield and Griliches, 1960; Hannan, 1976). While it is not 
feasible to determine the true values of the regressor from fallible 
Observed values, an investigator can Systematically plan to group on 
observed values of the regressor and thereby at least partially dis- 
attenuate his fallible ungrouped estimates. 


Case B: Correlation Between Group Level Regressors and Dis- 
turbances. It is possible that the assignment of individuals to groups 
may cause the group level regressors and disturbances to be cor- 
related. The example of this case that is most frequently cited is 
grouping by Y (Blalock, 1964; Burstein, 1975b; Feige and Watts, 
1972; Hannan and Burstein, 1974). 

When observations are grouped by Y, the corresponding group 
level and individual level parameters (8, and By ‚Вә and 8; gj and 
Bi ) will no longer be the same. As a result, the group level estimators 
(b; and b; for В; and B5, b, for 81) will yield inconsistent estimates 
of the ungrouped parameters (81, 82, and 81). The above occurs be- 
cause when Y is positively related to its regressors (i.e., 61, 6; pos- 
itive), grouping by Y tends to place high values of Хі, X2, and u 
(Ху and w for equation [3.3]) in the same group and low values of 
Ху and X; and и in the same group. Thus, while X, and Хз are 
uncorrelated with и, X, and X» will be correlated with u. 

Blalock (1964) described the consequences of grouping by Y on 
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те i = 
макт еар (Бух, bxy) and on correlation coefficients 
the eno. = out that although grouping on Y does not bias 
ie. pedit. E bxy (Le. бху = bxy), assuming byx and rxy 
ie > e, it does inflate the corresponding correlation coefficient 
bert ko But since bgybyx = "xy > Mxy = byy ух = 
Bian, А сЗа > Бүх. Thus, grouping by У inflates the re- 
The se ficient when Y is the dependent variable. 
the ee error for grouping by Y is always 
Y ну ере error for grouping by X. Although grouping by 
duces a as efficient as grouping by X, grouping by Y always intro- 
MSE (5) component of bias to the mean-squared error—that is, 
= (bias [b] )? + Var (b), where Var (6) is measured relative 


$ а level parameter f. 

yield > the bias from grouping by Y is sm. 

1975b) pow efficient estimates than random 
s is would occur because grouping by Y is systematically 


re] 

н а , while random grouping is not. Admittedly, the notion 

iK is. co las arising from grouping by Y is hard to imagine. However, 

"d nceivable that when few groups are formed and there is a 
8 relationship between Y and X, some improvement can occur. 


greater than 


all, grouping by Y can 
grouping (Burstein, 


Case C: Grouping and Specification Bias. In reality, the groups 
ly formed on the basis of a 


in 4 

vic tiong effects research are rarely ! 
ё basi unrelated to X and Y, on the basis of the regressors, or on 
group; is of outcomes. In general, the exact mechanism controlling 

Ping cannot be determined. 

dents are grouped into schools 
jon of their own background 
d occupation; student ability), 
munit ies in which they live (com- 
i Y wealth and demographic properties), and some random 
schools may have a 


iffe. 
If pom but equally complex set o 
€ above argument is correc 
nal effects from macrolevel data 
tions can be traced to two factors: 
2) exacerbating the 


(1 З 
mise, піззресійе individual level model and ( 
4: ification by grouping (Burstein, 1975b; Hannan and Burstein, 
1975. annan and Young, 1976a; Hannan, Nielsen, and Young, 
this ¢ Hanushek, Jackson, and Kain, 1974). We can demonstrate 
ase by returning to the individual level (equation [3.1] and 


ents a | 5 
* шаўца microlevel educatio 
Und to occur. The complica 
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[3.3] and group level models (equation [3.6] and [3.7]) intro- 
duced earlier. 

For the case of grouping by an omitted regressor (i.e., by factors 
selecting students into schools, classes, etc.), the investigator speci- 
fies the relationship of interest to be as in equation (3.3): 


Y=o'+ f X, + ш. (3.3) 


In so doing, he or she ignores the grouping mechanism, X,, which 
is correlated with both x, and w. That is, if the investigator had 
achnowledged the potential effects of group membership, the correct 
specification would be as in equation (3.1): 


Y=a+8,X, + B,X, +u. (3.1) 


As pointed out earlier, under these conditions the least-squares 
estimator, 61, or 6; yields inconsistent estimates of Bı from the 
correctly specified model. This bias—more accurately, specification 
bias—occurs from failing to recognize the role of group membership 
at the individual level. 

In substantive terms, the coefficient from the individual level 
(total) regression of outcome, Y, on the regressor, X,,is the wrong 
coefficient when group membership, X,, is systematically related 
to both Y and X,. Under such conditions, the coefficient of interest 
should be 6,, which can be shown to be the coefficient from the 
pooled within group regression of Y on X, (cf, e.g., Burstein, 
1975b, 1978c; Duncan, Cuzzort, and Duncan, 1961; Firebaugh, 
1978; Werts and Linn, 1971). This point reappears later on. 


Earlier, the bias from using bi to estimate B, was said to be 
рит (61) — 8, = Вођа. (8.5) 


Note that this expression is zero when x, is unrelated either to 


xy (by, = 0) or to Y after controlling for X, (85 = 0). Under these 
circumstances, the disturbance in equation (3.3), 


w= ВАХ, +u, 


is uncorrelated with X 1, and so no specification bias occurs. 

When individuals are grouped by X, while it is incorrectly assumed 
that equation (3.3) is the underlying model to be estimated, the 
effects of misspecification are compounded by the effects of group- 
ing. We would be erroneously using 61, the estimator for В; in equa- 
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tion (3.7), as an estimator of 8; in equation (3.1). The resulting 
bias is 


рит (61) бу = (8 + Bob) — bı 
= ПАРИ 


since groups are formed on the basis of Хр, Ба > ba (see the dis- 
cussion of grouping by Y under Case B). Thus, the bias from esti- 
mating 6, from grouped data (i.e., by b;) when the grouping variable 
is positively related to both the regressor and the outcome and has 
been left out of the individual level regression is larger than the 
Specification bias alone. That is, 


рит b1— B, > рит bi — By 
Bobo, > Boba» 


as long as 8, and b, are positive. | 
dien: ihe гене эн is likely to be in the case described 
above when attempting to make inferences about microlevel relations 
from macrolevel data. It is exceedingly rare to find an investigation 
of educational effects in which all the necessary variables affecting 
Outcomes have been identified and accurately measured. If these 
Conditions (proper specification and measurement) are met, лере 
does not bias estimation. When they are not, the investigator can 
Only take some small comfort in the realization that the vem 
between macrolevel and microlevel estimates suggest that the aas 
Was most likely misspecified (because of the exclusion of em 
Tégressors representing group differences or because 9 y 


Measured regressors) in the first place. 


pirical Illustrations level inferences described 


among 


fi z А iei 
Сев, in terms of bias and efficiency, by examples drawn from 


“ает. These illustrations are followed 
actual research on school effects. 

Grouping. Burstein (1974, 
d Burstein (1974) present 


tern university to 
ing. A 


" Illustrations over Different Types of 

975a, 1975b, 1978c) and Hannan and Ри 
data from entering freshmen in a large m ws ds of group 
illustrate the consequences of the various knes 
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subset of these data are used for our present purposes; the reader 
is referred to Burstein (1975b) for a more detailed discussion. 

The basic results from estimating the simple linear regression of 
the total score on an entering achievement test battery (ACH) 
on total score on the Scholastic Aptitude Test (SAT) from grouped 


data are contained in Table 3-1. The estimate of the individual 
level regression is 


ACH - .839 SAT, SE (b) — .011. 


(Note that all variables were initially standardized at the individual 
level.) 

The present model corresponds to equation (3.3). The micro- 
level observations on ACH and SAT were grouped by the variable 
indicated in Table 3-1, and weighted regressions of the group means 
on ACH on the group means on SAT were calculated. The grouping 
variables were classified on the basis of their known relationships 
to ACH and SAT according to the following types—random group- 
ing, grouping by the included regressor (X1), grouping by an omitted 
regressor (X5) that is uncorrelated with Y after controlling for 
Ху, grouping by Y, and grouping by an omitted regressor (Х»( 
that is correlated with both X; апа Y-X, (Y controlling for Ху). 
For each grouping variable, Table 3-1 presents the number of 


groups formed; the group level estimates, 51; their standard errors; 
the estimated standard error of bi as an estimator of B1; and the 
partial regression coefficient byx, * X5. The points made earlier 
about the different cases are illustrated. The "advantage" of grouping 
on the regressor (SAT2) over virtually all other methods is evident. 
The efficiency of random grouping (ID2 forms about nine times 
as many groups as does any other variable) is also apparent. Grouping 
by Y (ACH2) yields particularly poor estimates; even so, its mean- 
squared error approximates random grouping with the same number 
of groups. 

If one is fortunate enought to group by some variable that is 
unrelated to Y after controlling for the included regressors, group 
level estimates are typically very good. But when grouping is by some 
characteristic related to both Ys controlling for Хү, and X, (a mis- 
specified microlevel model), group level estimation can be disastrous. 

By comparing b} and Б^ to byx, * X2, we gain some idea of the 
relative impact of specification bias when compounded by grouping. 
In every reasonable case (excluding random grouping and grouping 
on Xj), grouping exacerbates the bias due to misspecification, 


assuming, of course, that the model including both X, and X» 
is more correctly specified. 
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Examples from School Effects Research. Studies of educational 
effects have been conducted at many levels and with data from 
multiple levels. Averch et al. (1972) provide short descriptions of 
a wide range of studies. The most frequently cited results from the 
Coleman Report (Coleman et al, 1966) are based on between 
school analyses and a mixed model with individual level measures 
of pupil outcomes and pupil backgrounds and aggregate measures 
of school and teacher characteristics. The Six Subject Surveys 
conducted by the International Association for the Evaluation 
of Educational Achievement (IEA) (Comber and Keeves, 1973; 
Peaker, 19775) report similar analyses. 

Both the Coleman Report and the IEA studies analytical models that 
decrease the likelihood of identifying important teacher, classroom, 
and school characteristics (Bidwell and Kasarda, 1977; Burstein, 
1976b; Burstein and Miller, 1978; Burstein and Smith, 1977). Data 
from students were not matched with the characteristics of their own 
teachers in either study. This practice by itself could limit the 
possibility of capturing the educational process as it affects indi- 
vidual students—except in the unlikely event that the process is 
uniform within and across classrooms within a given school. More- 
over, when teacher variables such as experience and instructional 
practices are measured only at the school aggregate level, they 
are distal school resource characteristics that can be expected to 
behave like such global school characteristics as books in the library, 
overall per pupil expenditures, and characteristics of the principal. 

In general, between school analyses tend to accentuate the effects 
of background, because students from similar backgrounds tend to 
be grouped together. That is, at the school level, aggregate student 
background measures reflect community characteristics that at least 
for the United States, determine to a large degree the resources 
(per pupil expenditures, teacher quality, etc.) available to run school 
programs. Moreover, communities with high quality school pro- 


grams attract families from all social class levels who are motivated 
же to move to obtain higher quality education. Whatever the 
mec 


anism, this aggregation process does not reduce background 
variation to the extent that it does v 


characteristics. As a result 
in between school analyses 
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T: Е 

сах 3-2. Between School and Between Student Regression Analyses of 
e Factors Affecting Schience Achievement (RSCI) for U. S. fourteen 

year olds in the IEA Study. 


А Between Between Differences C lati 
Variable Students Schools in Effects uo 
SE 
X —4.157 —6.620 —2.463 .09 
ЕЕ (11.87)а (3.90) 
.861 .876 .015 21 
A (23.22) (6.47) 
occ к 843 .856 .23 
É (2.87) 
BO 
KHOM ES 3.577 .916 16 
у, (3.37) 
GR. 
ADE E 1.390 —.522 41 
24) (1.69) 
SCI 
STUDY .066 110 040 25 
(4.29) (2.34) 
EXP 
LORE 130 240 110 E 
- (2.52) (1.50) 
39 72 
N observations 1,806 107 
ae , 


w total science score (RSCI), raw word 


knowledge score (RWK), sex of student (SEX), student's report of 
ather's occupation (РОРОСС), student's report of number of books 


in the home (BOKHOM), grade in school (GRADE), student's report 
student's report of degree of use 


od exposure to science (SCISTUDY), 
exporation in science in science instruction (EXPLORE). 


Note: 
te: The variables included are: ra 


ать 
А statistics are in parentheses. 
e: Burstein, Fischer, and Miller, 1978. 


It should be noted that the coefficient for RWK, a measure of 
latively stable across levels. 


Ci 
tac verbal ability, remains re 
then ee the grouping mechanism that allocates students to 
abilit S does not distort the estimate of the relationship of verbal 
calis s Science achievement (though evidence presented below 
Othe: interpretation into question). 
en r studies with analyses at multiple levels pr 
Се of the consequences of grouping. Haney (1974) and Hannan, 


cia and Meyer (1976) report empirical analyses at three levels. 
love Е from the theoretical resulis on analyses at different 
ieu the coefficients at the various levels in the Haney and the 
mod ng et al. reports differ in magnitude, different variables enter 
табо, at different levels, and aggregation generally inflates the esi- 

effects of background relative to the effects of school factors. 


ovide further evi- 


134 Issues in Microanalysis 


In an analysis at the student level with school aggregate teacher 
and classroom characteristics and global school characteristics, 
the aggregate measures begin with a disadvantage relative to the 
student level background factors. Since the former are measured 
only at the school level, they can influence only the mean outcomes 
for the school, which account for about 20-25 percent of the over- 
all variation in student achievement (п2 sc; = .23 and n?r wx 
— .21 in Table 3-2). In contrast, the individual level student back- 
ground measures can be associated with within school variation 
in outcome as well as with between school variation. This explan- 
ation provides an additional technical consideration that can help 
explain the poor showings of school level measures of the education 
a student receives. But these explanations are not likely to explain 
away the typically strong effects of background and the weak 
effects of schooling. The remainder of the truth lies elsewhere. 


Concluding Comments about 
Cross-Level Inferences 


Both the theoretical underpinnings and realistic examples of the 
consequences of attempting to estimate parameters of microlevel 
processes from macrolevel data have been presented. If one agrees 
with Averch et al. (1972) that “the researcher would like to examine 
the relationships among the school resources an individual student 
receives, his background, and the influences of his peers on one 
hand and his educational outcome on the other" (p. 38), then 
individual level data or at least the assurance that the aggregation 


process does not distort the relationships among important variables 
are necessary. 


In general, it appears to be impossible to avoid distortion through 


aggregation in school effects research if microlevel processes are of 
interest. The grouping mechanism that allocates students to schools 


and to classrooms within schools is just too complex to specify 
adequately. Therefore, when the purpose is to identify factors 
affecting individual student ac 


| hievement, I would discourage cross- 
level inference. Thus, the many studies that use either schools 
(Katzman, 1968; Thomas, 1962), school districts (Bidwell and 
Kasarda, 1975; Kiesling, 1969), 

1974) as units of analysis are 
of corresponding effects in stu 


» Hanushek, 1972 


The Role of Levels of Analysis in the Specification of Education Effects 135 


We shall later place some conditions on the assertions in the last 
paragraph. Nevertheless, in order to conduct an adequate investi- 
gation of educational effects on individual student performance, 
it appears that the investigator must measure every variable at 
its lowest possible level and be able to match each student's data 
with the data from the teacher, classroom, classmates, and school 
(Burstein, 1975a, 1975b; Burstein and Smith, 1977). Otherwise, 
the study of the effects of education on individual students might 
as well be forgotten. 


GROUP-STRUCTURAL-CONTEXTUAL 
COMPOSITIONAL EFFECTS 


In this section, we consider the issues and problems that occur 
when macrolevel variables are specified to affect microlevel out- 
Comes in an analysis of educational effects. This study of the effects 
of Properties of groups or collectives on individuals is generally 
called “contextual analysis” (Lazarsfeld and Menzel, 1961). 

Since properties of teachers (e.g., instructional practices), class- 
Tooms (e.g., availability of aides), and schools (e.g., variety of pro- 
gram offerings) are at a macrolevel with respect to students within 
Classes and schools (the microlevel for present purposes), analyses 
that mix student level outcomes with group level independent 
Variables are prevalent. The between student analyses in the Coleman 
Report (Coleman et al, 1966) and in the IEA studies (Comber 
and Keeves, 1973) were carried out in this manner. Indeed, most 
educational effects research involves specifications with group 
level explanatory variables. As the term is used by sociologists and 
Economists, “school effects” research (Alwin, 1976; Alexander and 
Eckland, 1975; Coleman et al., 1966; Hauser, Sewell, and Alwin, 
1976) generally involves this type of analysis. The terms, “teacher 
effects” and “classroom effects” in the research on teaching litera- 
a also typically refer to models involving macrolevel explanatory 
actors, 


Inconsistencies in Terminology _ 
disting oe вазо 8 s - T 

modifi “effects 2 Р 
definitional pee ae confusion that abounds in the КЕ 
Science literature on models that incorporate variables from mu iple 
SVels. There have been numerous attempts to disentagle the АШ 
Morass implied by the four terms (group, dcn. ig ance 

compositional), beginning with the studies of the 


taposition of the four “visually 
e section title reflects the 


136 Issues in Microanalysis 


soldier reported by Stouffer et al. (1949) and winding through the 
so-called Columbia School tradition (see, e.g., Lazarsfeld and Menzel, 
1961), several chapters in the compendium on quantitative ecologi- 
cal analysis in the social sciences edited by Dogan and Rokkan 
(1969, see especially Valkonen [1969]), and recent papers by Fire- 
baugh (19777) and by Karweit, Fennessey, and Daiger (1978). 

Unfortunately, the various clarifying papers also disagree on 
terminology. For the time being, we shall adopt a compromise set of 
meanings. "Group effects" and “structural effects" (Blau, 1960) 
shall be used interchangeably to denote the generic effects of macro- 
level properties on individual level behavior. Both global (macrolevel 
properties not based on aggregation of individual characteristics—e.g., 
teacher sex for a classroom) and aggregate properties of groups 
(classrooms, schools, etc.) are subsumed under these terms. 

“Contextual effects" and “compositional effects” will denote the 
effects of aggregate properties of groups only. Initially, contextual 
effects will be measured by the effect on microlevel outcomes of the 
group mean of a microlevel explanatory variable net of the micro- 
level values on the corresponding variable (Alwin, 1976; Farkas, 
1974; Firebaugh, 1977; Hauser, 1970, 1971, 1974). That is, a con- 
textual effect for ability is said to occur when group mean ability 
(X) is related to individual outcomes (Y) after controlling for in- 
dividual ability (X). The term compositional will refer to a more 
general set of operationalizations of aggregate properties (e.g., Davis, 
Spaeth, and Huson, 1961; Valkonen, 1969). 

The methodological discussion of contextual effects considers two 
additional types of effects—individual and frog-pond. Individual ef- 


fects refer to the impact of microlevel properties (X) on microlevel 
outcomes (Y) in models with variables from mixed levels. Individual 
ability, Ху, (where i = group, j = person within group) is such а 
variable in models with group mean ability, X; , and individual out- 
come, У;. | 
The frog-pond or comparison effect is based on “the key social 
psychological principle . . . that success is judged by relative standing 
in the social group” (Davis, 1966:25). Thus, an individual's per- 
formance is a function of 


one's relative standing in the group (class- 
room, school) to which : cage 


one belongs. Davis (196 ts that а 
Student's performance may i (1966) suggests that 


| y ranking relatively higher 
Е 1n a small pond” 
Variations on the frog-pond concept pe: 
stantive work on the social 


groups. The notion is basic 


rmeate a variety of sub- 
“psychological effects of status within 
to reference group theory (e.g., Merton 
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and Kitt, 1950), to social i i 

P» $ » to 50; evaluation theory (Pettigrew, 1 

= (cw иша hypothesis in the lode bue ier 

бање а ре Іехапдег and Eckland, 1975; Davis, 1966; Hauser. 
el, vaL AN 1976; Meyer, 1970; St. John, 1971). ' 

Mr der ње y, frog-pond effects in educational effects research are 

и —— by the impact of the individual's deviation from 

comer (Y, Fa or her group (Ху — X;.) on individual level out- 

зен ч ие ңе ү be the effect or a student’s ability 

aa ean ability on the student’s performance on an 


pita and Identification of Group Effects 

Modes es died pe к the specification and identification of 

Group effect п extual effects in research on the effects of education. 

Strated repe ches individual educational outcomes have been demon- 

to-school on edly. Affirmative answers to questions about school- 

Comes а 1 classroom-to-classroom differences in individual out- 
re evidence of group effects. 


S 
Givin oe A es is left unmentioned by the above, however, is the 
group diff etween group differences. On the one hand, between 
the А. erences in outcomes may be attributable to properties of 
oth ups themselves or to processes within the groups. On the 
ly bea function of the “selec- 
Cronbach 


s in education can occur 
» For example, system- 
establish selective ad- 


atic 
heir chances of a high level 


missio 
of uc, procedures, thereby increasing t 
ent attainment (e.g. higher average 


e hievement 
predominance of local control and financing of education 


е United States leads to “natural” selection effects. These occur 


In th 
ationship between community 


ac 

inne e dts of the symbiotic rel 

able, no ote and the quality and motivation of the students avail- 
ligh wealth areas attract achievement-oriented families. At the 


8 

е +11: 
сап ime, communities with highly educated (and wealthy) families 
hildren and their schools and 


ts of both time and money. 
]t is the same: schools differ in 
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the mean backgrounds (entering ability, socioeconomic character- 
istics) of their students. And typically, these mean entering differ- 
ences translate into between school differences in outcomes that may 
have little to do with the quality of the school's educational program. 

If selection factors can account for between school and between 
classroom differences, other substantive interpretations of school 
effects are unwarranted. There is one important caveat to this state- 
ment, however. If the effects of classrooms and schools are cor- 
related with the effects of selection into groups (e.g., ability 
determines assignment to classrooms, but the classroom effect is 
also greater for higher [lower] ability students than for lower [high- 
er] students, partitioning the group effect into causal (substantive) 
and spurious (due to assignment into groups) components is virtually 
impossible. This confounding of group membership with school 
resource variables (the educational “treatment”) is the source of 
much of the methodological controversy surrounding Coleman et 
al.’s (1966) interpretation of the relative influence of school and 
home background on achievement. The IEA studies experienced 
similar difficulties in interpretation (Peaker, 1975; Schwille, 1975). 
The problems associated with the relationship between assignment 
to groups and “treatment” in quasi-experiments with nonequivalent 
control group designs also fall within this domain (Cronbach, 1976; 
Cronbach, Rogosa, Floden, and Price, 1977). 

The usual way to remove selection or grouping mechanisms as an 
explanation of group effects is to specify the grouping mechanism 
within the model. That is, if assignment to schools is on the basis of 
Socioeconomic background, the model used to estimate school ef- 
fects should also adequately specify Socioeconomic background (the 

inputs" to the school). While such models do not solve the problem 
of correlation between the Selection and causal effects of group 
membership, the effects attributed to causal mechanisms in such 
gt x typically conservative. So the existence of significant 

er controlling for selection) is impressive evidence that 


some causal mechanism of the group (school, classroom), as yet 
unspecified, is operating on the student. i 


Empirical Illustrations. 


Data from three studi -3 to 
3-5) are provided to illustra. iso nd авв 


te the magnitude of school and classroom 


nested analysi i P s сан 
schools): ysis of variance model (pupils within classrooms within 
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Table 3-3. Proportion of Total Variation in Treatment? Mean Attributable 
to Classrooms and Schools for Fifth Grade Pupils on the Stanford Achieve- 
ment Test. 


Proportion of Variation 


Subtest Classrooms (N — 19) Schools (N — 10) 
Word meaning 18 62 
Paragraph meaning 48 165 
Spelling 18 .52 
Language .50 31 
Arithmetic computation .68 14 
Arithmetic concepts 59 11 
Arithmetic application .38 .32 
Social studies 44 88 
Science .06 .72 


a, 
the Proportions of variance were actually reconstructed by Wiley and Bock 

io the variance components under the assumption that each “treatment” 15 
Presented by one classroom of thirty pupils in each school. The proportions 


a Variance in “treatment” means. 
ource: Wiley and Bock, 1967. 


Y qa, Bcey:s + Yip):cs> (3.11) 


Where 95, Вис) а, and Yep):cs represent effects of schools, classrooms 
within schools, and pupils with classrooms, respectively. 


Background factors have not been taken into account in either of 


these sets of analyses. The studies, however, do reflect variations in 
Sampling frame that should be associated with the likelihood of 
Selection mechanisms. The Wiley-Bock data come from a single 
Suburban community in northern Illinois and probably reflect 
qsentially random mechanisms for allocation to schools. The ECE 
ata are drawn from schools in moderate- to large-sized бшшш 
Schoo] districts. Moreover, schools receiving some compensatory ed- 
Ueation funding and ECE funding (based on poverty, partly on 
Willingness to ‘“‘reform’’) were overrepresented because the purper” 
9f the study was to evaluate the effects of ECE and of compensatory 


fundin 
8 plus ECE. 
aken at face value, the data from Tables 3-3 and 3-4 suggest 


RR comments about school and classroom effects. 
1.15 Bock data: 
2 Co! and classroom effects are larg®. or schools differs ac- 


; € variation associated with classrooms 


Cording to content area, with more between school variation in 
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reading, spelling, and science and more class within school vari- 
ation in mathematics, social studies, and language. 


ECE data: 

3. School and classroom effects assessed by criterion-referenced tests 
are moderate at best in both grades 2 and 3. 

4. Classroom within school effects are larger than between school 
effects in math, while the reverse is true in reading. 


Murnane (1975) attempted to specify properly student and com- 
munity inputs to schools and classrooms so that any residual effects 
could be attributed to classroom or school effects. The contribution 
of classroom and school membership to variation in spring achieve- 
ment test scores was determined after controlling for the pupils' 
previous spring test scores, community background, sex, and at- 
tendance. 

Though the Murnane study involves second and third grade 
children, as does the ECE study, his schools were all Title I schools 
in New Haven, Connecticut. The sample is much more restricted than 
in the ECE study. This homogeneity provides a further control for 
the selection mechanism that might be used to explain any group 


effects. The results from Murnane’s study suggest the following 
interpretations. 


Murnane data: 


5. Moderate school and classroom effects exist even after controlling 
for inputs. 


6. The effects are somewhat larger in math than in reading. 


7. The effects associated with classrooms within schools are slightly 
though consistently larger than the effects associated with schools. 


Interpretations. 


The studies cited vary along several dimensions. 
In fact, the main s 


1 1 similarity among them is the existence of statis- 
tically and practically significant effects of classrooms and schools. 
Of course, the magnitude of the effects reported vary as a function 


of grade, sampling frame, subject matter, and degree of control for 
individual background factors and selection effects. However, the 
pattern of effects is not unreasonable. They are larger for the more 
homogeneous samples at the elementary grades (Murnane and 
Wiley-Bock as opposed to ECE). 


The differences in effects across Subject matter are particularly 
informative. Mathematics instruction exhibits stronger effects than 
reading does across grades, with most of the impact associated with 
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classroom-to-classroom differences. This latter finding is as expected. 
While home influences have a stronger impact on reading, mathe- 
matics is a subject matter largely learned in school (with, perhaps, 
the exception of number recognition and counting). Moreover, while 
mathematics, like reading, is offered in virtually all elementary 
School classes, teachers’ skills in instruction and their attitudes 
toward mathematics—which together presumably determine the 
Overall quantity, variety, and quality of mathematics instruction in 
the elementary grades—vary greatly among classrooms within schools 
С e.g., Burstein, 1978b, Dishaw, 1977; Filby and Fisher, 1977). 
These variations in skill and attitude are presumably much smaller 
between schools. In contrast, the quantity and coverage of reading 
Instruction are presumably more homogeneous across classrooms. 
These differences in the relative magnitudes of classroom and school 
effects for mathematics, both by themselves and also when com- 
Pared with reading, can be explained by the influence of pupil back- 
ground and by the differences in teacher skills and attitudes within 
Schools, 

While the existence of group effects at the classroom and school 
levels is evident from the above examples, we have not provided any 
Indication about how these effects arise. The two types of analyses 
Presented (nested analysis of variance and regression analyses with 
Classroom and teacher dummy variables) can serve only one limited 
Purpose: they can tell the investigator whether it is important to 
Consider school and classroom influences on pupil outcomes. As- 
Suming no interactions between individual background and class- 
TOOm-schoo] factors, these analyses provide an upper bound on the 
amount of variation in individual outcomes that can be attributed 


to classroom or school level variables. 


Specification and Identification of 


Conte 
xtual Effects 
sah S mentioned in the previous section, the use of classroom or 


m. Research on contextual 
5 (Alexander and Eckland, 1975; Alexander, Cook, ane иш 
; Alwin, 1976; Campbell and Alexander, 1965; ete T 
augh, 1977; Hauser, 1970, 1971, 1974; McDill an ca 
; Meyer, 1970; Nelson, 1972; among many others) En 
ae attempt at бте direct assessment of group effects on in ividua! 
Utcomes. 


1 
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Typically, context has been operationalized by aggregating the ob- 
servations of group members, X;;, on the variable of interest. Then 
so-called contextual effects refer to the effect of this aggregate 
measure of context, Ху, on individual outcomes, Ууу, net of the 
individual's effect, Ху, for the same variable. The analytical results 
for contextual effects, frog-pond effect, and individual effects are 
developed below. The presentation draws heavily from Firebaugh 
(1977). 


Preliminary Specification. For a single independent variable, the 
structural equation for the contextual effects model can be written 
as 


Уу = «+ Byy Zy By x Xi 6 (3.12) 


where У and Ху are individual level measures on У and X for 
person j in group i, X, is the mean for group i on variable X,and € 
is a random disturbance term with the usual least-squares properties. 

The coefficient В „ in equation (3.12) is the standard measure 
of the contextual effect. If 8 LE is significantly different from 
zero, group context is said to have an independent effect on indi- 
vidual outcomes (Alwin, 1976; Farkas, 1974; Firebaugh, 1977, 
1978; Hauser, 1971, 1974). That is, in a study of the effects of 
ability on achievement, a significant Ву „ is interpreted to mean 
that the level of ability of the group (as measured by X) has an 
independent effect on individual achievement. An ability context is 
then given a substantive interpretation (see e.g., Alexander and 
Eckland, 1975). 

The so-called individual effect is measured by the coefficient 
Вух x If Byy „у is significant, an individual’s ability is said to have 
an independent effect on the individual’s achievement after con- 
trolling for the effects of group mean ability. 

Intuitively, the results from an application of equation (3.12) in 
the example described would seem to be theoretically sound. That is, 
all things being equal, we would expect an individual’s ability to have 
a significant impact on performance. Moreover, group mean ability 
can be expected to influence instructional practices by causing the 
teacher (school) to adjust instruction to the level of the students in 
the class (school). As a result of the ability context effect on in- 
structional practices, individual students within the class (school) can 
be expected to learn more or less than they would in other classes 
(schools). 

The theoretical interpretations of individual and contextual effects 
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presented above emphasize the role of two learning principles— 
learning ability and opportunity to learn—on the achievement 
process. That is, on an absolute scale, individuals learn more (less) 
because (1) they begin with higher (lower) ability and (2) they learn 
in groups where instructional practices offer more (high ability 
group) or less (low ability group) content to be learned. To make 
the point even more concrete, Jane cannot solve fractions problems 
because either (1) she lacks the ability to solve fractions (an in- 
dividual effect) or (2) though capable if taught, she was not given 
the opportunity to learn fractions because the teacher felt that the 
class was not ready for the topic or needed more instruction in 
multiplication instead. 

Most of the sociological literature on school context effects does 
not base interpretations of such effects on such psychological 
grounds as opportunity to learn. Instead, sociological or social- 
PSychological interpretations are offered. According to a standard 
Interpretation, group values on X affect the individual by “setting 
and enforcing standards for the person" (Kelly, 1952; see also Alwin, 
1976; Firebaugh, 1977; Meyer, 1970) and by functioning as “сот- 
Parison point(s) against which the person can evaluate himself" 
(Kelly, 1952:413; see also Davis, 1966; Firebaugh, 19777; St. John, 
1971). These descriptions refer to contextual and frog-pond effects, 
respectively, According to the former, the motivation associated with 
either peer pressure or social sanctions causes Jane to achieve higher 
(lower) than she would have otherwise. If the fact that Jane is high 
(low) when compared with the ability of her class (school) causes 
her to perform lower (higher) or higher (lower) than she would 
have otherwise, a frog-pond effect, presumably due to the motiva- 
es effects of one's relative standing within the group, 15 said to 

is 
A structural model that includes both contextual and frog-pond 


e 
ffects can be written as 


У = X, + е, (8.18) 


ppm Вус). Сш -X,)* Byg -«x-X 


== (у — X, ) is the measure of relative standing of a person j in 

Pig Е. is the coefficient associated with the frog-pond 
te and клу _ is the coefficient associated with the con- 
aktua] effect. We. could have written the coefficients as B yx x) 
E P yy. since (X—X) and X are uncorrelated. 
Ote that equation (3.13) does n 


ot include a measure of the 
Vidua] effect, X;. Only relative S 


= tanding and group mean per- 
i : 
ance are specified to affect micro 


level outcomes. Instead, we 
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might have argued on psychological grounds that only individual 
pupil ability, Ху, and relative ability within the group, Ху — Xi, 
might affect individual outcomes. According to this view, a stu- 
dent's relative standing within the group could determine the amount 
of instructional resources that the teacher allocates to the student 
and hence the amount that the student learns over and above level 
of ability.2 A structural model incorporating individual and frog- 
pond effects can be written as 


Уу=а + Byy x x Xu + By xy. x XiT X,)* e, (9.14) 


where the coefficient By, (yx) is interpreted to measure the in- 
dividual effect while B,(. ғу. х measures the frog-pond effect. 

It should be obvious to the reader that (1) plausible alternative 
sociological and psychological interpretations of contextual and 
frog-pond effects have been offered and (2) models involving ро" 
tentially different estimates of all three effects (individual, con- 
textual, and frog-pond) have been generated. Table 3-6 summarizes 
both the substantive interpretations and possible estimators of the 
three effects of ability on achievement, with class defining the 
group membership in the two effect models. Similar interpretations 
can be offered for socioeconomic background (SES) and other input 
characteristics, for other outcomes besides achievement, and for 
other group membership indexes (e.g., school, school district, etc.) 

We can carry this theoretical exercise one step further and argue 
that all three factors (individual ability, group mean ability, and the 
individual’s ability relative to that of the group) affect individual 
level outcomes. The specification of this model is given by the 
structural equation 


У = а“ + 6 Ху + ВХ, + Ва(Ху — Xi) + е, (3.15) 


with 81, Вг, and Вз representing the coefficients for individual (Ху) 
contextual (Х,) and frog-pond (Ху — Х,) effects, respectively: 


Analytical Complications. Unfortunately, the model represented 
by equation (3.15) is not estimable by any standard means. That 15, 
B1, B2, and 83 cannot be estimated simultaneously, since Ху, Xi: 
and Ху — Xj; are linearly dependent (Burstein and Miller, 1978; 
Cronbach, 1976; Firebaugh, 19777). As a result, the same data can be 
generated by a variety of alternative combinations of individual, con 
textual, and frog-pond effects. Thus, when measured by Хр ^5 
and X;; — X;., respectively, these three effects cannot be estimated 
simultaneously. 
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Table 3-6. Substantive Interpretations and Estimators of Individual, Con- 
textual, and Frog-Pond Effects of Ability on Achievement in Classrooms in 
Two Effect Models. 


Type of Effect Alternative Interpretations Estimators? 
Individual 1. A student's ability affects the By c.i 
student's learning and hence " 
measured achievement Byy "x-X) 
C 
Ontextual 2. Psychological (opportunity to Boss 


learn)—group ability affects Е 
instructional practice (amount Bex -x-X) E бух 
of instructional time, topics 

covered, way in which topics 

are taught), which, in turn, 

affects individual learning and 

achievement 


2. Sociological (peer pressure and 
social sanction)—group ability 
affects individual motivation to 
learn and hence individual learn- 
ing and achievement 

Frog. 
Е-Ропа 1. Psychological (opportunity to Dvix.xyx = Ву 
learn)—the student's relative = 
standing within the group Byix-x): x 
affects the allocation of in- on 
structional resources and style 
of instruction provides the 
student and thereby the 
student’s learning and achieve- 
ment 


2. Sociological (relative status 
effects)—relative standing in 
the group affects individual 
motivation to learn and there- 
by individual learning and 

a achievement 


a, — 
хо етот are the coefficients from equations (3.12) through (3.14) for 
i» and Xy —Х,. 
T 
the ~ Problems in estimating equation (3.15) have implications for 
in turn © models incorporating only two of the three effects. If we, 
(3.12) [^E 83, 81, and 8, equal to zero, we would obtain equations 
SPecificay S and (3.14), respectively. But if this is so, these three 
" .0ns of individual, contextual, and frog-pond effects (83 = 
> and 8, = 0) are observationally equivalent, and none can 
Out simply on technical grounds. Thus, the estimators in 
76 do not represent separable effects, and questions about 
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which of the two estimates of each effect is the correct one (b yy . x 
ог Byy xix) for individual, б „ or By iran for contextual, 
and Bor ay x OT Byer y. y for frog-pond) become less of an issue. 

The interdependency of the estimators from the three effects can 
be approached from an entirely different perspective. It has been 


shown (Duncan, Cuzzort, and Duncan, 1961; Cronbach, 1976) that 
Br = пуй + (1— n? Bw, 


where f is the coefficient from the between student regression of 
Ү on Ху, Bg is the coefficient from the weighted (by group size) 
between group regression of У, on Х,, By is the coefficient from 
the pooled within group regression of У, — Y; on Xj — Xi, and 
2. is the ratio of the between group variation on X to the total vari- 
ation on X (the correlation ratio for variable X). 

Given the above decomposition of the individual regression into 
between group and within group regressions, we can show (cf. 
Burstein, 1977; Firebaugh, 1977; Werts and Linn, 1971) that in 
equation (3.12), 


Byx.x = Bw and By xy = fg — Ву; (3.16) 
in equation (3.13), 


Bycx-xy.x = Pyqx.x, = By» and 


Вух oxy 7 Вух = Pr; (3.17) 
and in equation (3.14), 
Byx cx x) = f, and 
Byax-x).x = By — вв. (3.18) 


Thus, no matter which specification is chosen, the three effects 
(individual, contextual, and frog-pond) can be shown to be various 
analytical combinations of two regression coefficients—the betwee? 
group coefficient, Вв, and the pooled within group coefficient, Bw - 
As before, we see that what appears to be three substantively intet 
pretable effects are linearly dependent when measured by Xip Xio 
and X; — Х,, and only two coefficients are uniquely estimable. 


Empirical Illustration. The interrelations among individual, сол” 
textual, and frog-pond effects can be illustrated using data on science 
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е of — viret year olds from the IEA Six Subjects 
vey. e models exhibited in the six col - 
variations of the general model соса 


RSCI = f(RWK, POPCC, GRADE, SCISTUDY), 
Where 


RSCI- total raw score on the science achievement test; 
RWK = total raw score on the word knowledge test; 
POPOCC = student’s report of father’s occupation; 
GRADE = the grade in which the student is enrolled; and 
SCISTUDY = student’s report of exposure to science. 


ane reported in the table are larger than twice their 
Th errors. 

eee three columns of Table 3-7 report 

Sor cre analyses—between student (total), 

models d school. The last three columns present results from 

WK hat posit individual and contextual effects of verbal ability 

) and socioeconomic background (POPOCC) on achievement 

and frog-pond effects (as in equa- 


tio i 
n [3.14]), and contextual and frog-pond effects (as in equation 
for POPOCC and RWK 


the three single level 
between school, and 


I 
n columns (4) through (6), GRADE and SCISTUDY are repre- 
ж Зе в " ification accounts 


olumn (4), вѕсі,вмк 
==: 


= T 9 8 
Were т nd b = 199 and brscr,BRWK ~ 

beet included at both the student ‘and ‘school levels, we unde se 
~ < from equation (3.16) that basci.rw« would equal by scHOOL 


79 
1 – Orscisrwx would equal Ё perweenscHoor — 0 WITHINSCHO Sg 
ang 018 — .792 = .226. Similarly, in column (5) #ввсвк Z ht 
d a although from equation (3.17) we expec 
„За у = 1.018 and b = — 206; and Ш E Tm 
aS erwe =, basci.wRWK — 799 compared with 1.018 
d 193 K -960 and bgsci, wn WE Р 3.18) 
Dredicted from the application of equation (3.19). 


ysis 
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Hes a Theoretical Resolution 
a ч Socks in Table 3-7 reinforce the points made above about 
caes ud ependency among individual, contextual, and frog-pond 
ren s: samo X, ,and X; — Xi , repsectively. No amount 
aie іса manipulation of these three quantities can yield three 
philo | л given the linear dependency, and as a result, em- 

ical procedures cannot estimate any two of the ef iti 
estimating анаа А y e effects without 
lies "s key to the separation and identification of the three effects 
hey ewhere. Firebaugh (1977) suggests three methods of breaking 

inear dependency: 


И | 
2 Пе one of the variables on theoretical grounds; 
у effects; E direct measures of the frog-pond and/or contextual 
-Use differen ; У Б 
t 5 
effects. variables in measuring frog: pond 


and contextual 5 


All t] 1 
hree involve a respecification of equation (3.15) on the basis of 


тайене considerations. 

M Mee er the first method y 

and (3 cifications already represente 
ЛА). The distinction here is that 


n ime 
арш айо on theoretical groun 
ither individual, frog-pond, or contextual effects do not exist. 


uM e of theoretical resolution of a methodological problem is 
чакане chante in social science research. For example, the three 
to the simple causal chain (X^ Y^ Z) is observationally equivalent 

Spurious causation model (Y is a prior case of X and Z, but X 


and ‹ 
Z are not directly related). However, this does not imply that 
hed on theoretical grounds. 


tify distinguishing among 
dered on theoretical 


ields one of the three two regres- 
d by equations (3.12), (3.13), 
one arrives at one of these 
ds. That is, one is asserting 


€ second method places the burden on the investigator to think 
h frog-pond and contextual ef- 


mo 
oig Disce. d about the way in whic 
We mi € measured. Going back to our ability-achievement example, 
ње + reconsider what is meant by the effect of the student's 
Os е ability standing on performance. We have already offered 
оү. stantive interpretations of frog-pond effects (Table 3-6), one 
e shade wah relative allocations © i ces due to 
Motivati ent’s relative standing, the О 
ion to achieve as a function O 
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standing. Another interpretation might be that the teacher's percep- 
tion of the student's relative standing might affect the teacher's 
expectations, which, if transmitted to the student, affect perform- 
ance. 

In practice, the three separate substantive interpretations of frog- 
pond effects can be measured more precisely by alternatives to the 
Xj; — Xj measure. For example, the difference between the instruc- 
tional resources allocated to a student and the resources allocated to 
the average student in the class is a more direct measure of the frog- 
pond effect of relative allocation of resources. A student's response 
to a self-concept of academic ability scale may be a better measure 
of the frog-pond effect associated with perception of relative aca- 
demic standing. Finally, the teacher's judgments of the pupil's 
academic ability more directly capture the frog-pond effect en- 
visioned to operate through teacher perceptions and expectations. 
Any of these three measures described—the instructional resource 
measure, student self-concept of academic ability, and teacher 
perception of academic ability—might be substituted into equation 
(3.15) for X; — X; to specify more directly the frog-pond effect and 
to break the linear dependency. Similarly, we could replace X; or 
a measure of the contextual effect with more direct measures of the 
processes that the contextual effect is Supposed to represent. Re- 
placement of either X; or Ху — X, requires theoretical support but 
affords improved opportunities to interpret the process directly 
affecting individual outcomes. 

Finally, method 3 involves the recognition that frog-pond and 
contextual effects might involve different aspects of the group. Fire- 
baugh cites Alexander and Eckland's (1975; see also Meyer, 1970) 
argument that, for educational attainment, the SES level of the 
School determines the school's contextual effect, while the student's 


relative ability level within the school might determine the frog- 
pond effect. In this instance, two distinct characteristics (SES and 
ability) are viewed as antecedents of 

bership, while in method 2 alternati 
underlying characteristic (ability) were suggested. 

The shift to direct measurement of the processes believed to deter- 
mine frog-pond and contextual effects (methods 2 and 3) seems 
warranted regardless of whether one chooses to delete one of the 
variables from the model (method 1). And once this shift has been 
made, contextual and frog-pond effects as operationalized by Xi. 
and X;; — X; cease to exist. Instead, the distinctions of contextual 
and, for that matter, compositional and frog-pond effects from 
group and structural effects fade away. In some cases, the distinction 


the two effects of group mem- 
ve manifestations of the same 
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between the types of group effects and types of individual effects 
that occur in a group become blurred (cf., e.g., the alternative sub- 
stantive interpretations of frog-pond effects). 

In any event, we are left in the same camp with Hauser (1974) 
and Firebaugh (1977), although disagreement over terminology per- 
sists. Although more proximal than the use of class and school 
dummy variables, contextual and frog-pond effects as measured by 
Xi; апа X; — X; are still too mechanical and are distally related 
to the sociological or psychological processes that they are intended 
to represent. Instead, following Firebaugh (1977:14) and, implicitly, 
Hauser, we conclude that: 


1. Analyses involving both individual level and group level effects 
should be based on careful theory in which both the source and 
form of group effects are specifically stated. 

. As a result, contextual effects and frog-pond effects should be 
measured directly. 

- Once direct measures are specified, terminology more adequately 
describing these measures should be used in place of “contextual” 
and “‘frog-pond” designations. 


. While the first two points reflect substantive matters, the third is 
In response to a more emotional issue. The languge that one chooses 
© communicate theory in the social sciences is, in many respects, 
arbitrary. However, in educational effects research, the terms “соп- 
textual” and *'frog-pond" are strongly identified with a dispute that 
15 more form that substance. Therefore, in line with Hauser, I suggest 
that in order to encourage more careful examination of school and 
classroom processes, investigators drop both the offending practice 
(the use of X; and X;; — X; mechanically) and the offending lan- 
guage. More direct measures and language more descriptive of the 


Processes that operate within and between groups are available and 
Should be used. 


APPROPRIATE UNITS OF ANALYSIS 


When faced with the analysis of multilevel data, many researchers 
assume that there is only one appropriate unit (pupil, class, school, 
тев and proceed to justify their choice of a unit of analysis and 
Pisce the level at which all analyses are conducted. This attempt 

elineate the level at which the research question is addressed is 
a logical consequence of the problems of cross-level inference dis- 
cussed earlier. Since models analyzed at different levels rarely yield 
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similar results, investigators assume that the analysis at only one of 
the levels is correct. Unfortunately, this assumption is often un- 
warranted. 


Competing Arguments about Choice of Units 

Traditionally, a variety of competing reasons have been cited as 
justification for the choice of either students or groups as the appro- 
priate units of analyses in studies of educational effects. Both con- 
ceptual and statistical arguments have been voiced in favor of either 
level or against making a choice. A few of the key arguments about 
choosing the appropriate unit of analysis can be stated as follows: 


Pupils as the Appropriate Unit 

1.In education the phenomena to be investigated are pupil out- 

comes. More specifically, we want to determine the effects of the 
educational resources that an individual pupil receives and his or 
her background, and the influence of his or her community 
setting and peers on the individual pupil’s educational outcomes 
(Averch et al., 1972). Therefore, pupils are the units for which 
questions must finally be addressed. 

. Pupils react as individuals, and the effects on them should be 
the focus of educational evaluation (Bloom, in discussion in 
Wittrock and Wiley, 1970:271 ff.). 

. Effects in the classrooms are an aggregation of effects of environ- 
ment arrangements on individuals (Glaser, 
Wittrock and Wiley, 1970:271 ££). 

‚ In classroom interaction research, most teacher behavior directed 
at students is directed at individuals rather than at the whole 
class, and student individual differences affect such teacher be- 
havior. Even teacher behavior directed at the whole class inter- 
acts with student individual differe 
(Brophy, 1975). 

. Theoretical arguments concerning the effects of educational 
Structure on pupil outcomes are formulated at the pupil level. 
To analyze the data at the £roup level is to enhance the likeli- 


hood of specification and aggregation bias (Hannon, Freeman, 
and Meyer, 1976). 


сә 


in discussion in 


A 


nces to determine outcomes 


Groups (Classes, Schools, etc.) as Units 
6. The appropriate unit of stud 


; y in educational evaluation is the 
collective—class or school—rat 


her than the individual. The effects 
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7. The sampling unit determines the unit of analysis. If classrooms 
are the sampling units, they are also the units of analysis (Cline 
et al., 1974; Cronbach, 1976). 

8. The unit of treatment defines the level of analysis. If treatments 
are administered to intact classrooms (schools), they are the 
units (Cline et al, 1974; Cronbach, 1976; Glass and Stanley, 
1970). 

9. The treatments received by pupils and thereby their performance 
within classrooms (schools) are generally intercorrelated. These 
dependencies dictate the choice of between class (school) analyses 
(Glass and Stanley, 1970; Glendening, 1976). 

10. Characteristics of the teacher (school) take on the ame value for 
every pupil in a particular classroom (school). An analysis at the 
pupil level overemphasizes the amount of information one has 
about class (school) level variables (Keesling and Wiley, 1974). 


Other Considerations 

11. The same manifest variables answer different questions at differ- 
ent levels of analysis. Therefore, the research foci should deter- 
mine the appropriate units of analysis (Burstein, 1976a, 1977, 
1978a; Burstein and Miller, 1978; Cronbach, 1976; Haney, 
1974), 

Dependence among observations within classrooms (schools) is 
a matter of degree rather than existence. Therefore, choosing the 
group as the unit of analysis simply because students receive in- 
struction together in classrooms (schools) is inappropriate. Ap- 
proaches that investigate degree of dependency and adjust for the 
effects of dependency are more appealing than automatic aggre- 
gation to the classroom (school) level (Burstein, 1977; Burstein 
and Knapp, 1975; Glendening, 1976; Webb, 1977). 

Analyses of class (school) means can mask important between 
class differences in the within class distributions of outcomes 
and the relation of outcomes to inputs. Thus, the use of group 
means as the only unit of analysis is appropriate (Brown and 
Saks, 1975; Burstein, 1976a, 1977, 1978a; Burstein, Linn, and 
Capell, 1978; Klitgaard, 1975; Linn and Burstein, 1977; Lohnes 
1972: Wiley, 1970). > 
Overall between student analyses are weighted averages of be- 
tween class (school) and pooled within class (school) analyses 


ү thus are rarely advisable in educational contexts (Cronbach 
76). і 


12. 


13. 


14. 


The points cited above are generally compelling, and disagreements 
are virtually unresolvable if a choice of either pupil or class or school 


156 Issues in Microanalysis 


as the only unit of analysis is required. Moreover, those analysts who 
resort to theoretical justifications either reject plausible alternative 
theories or find themselves unable to choose. Picking the appropriate 
unit on the basis of statistical considerations (sampling, dependence 
observations) can also leave the choice unresolved because of com- 
peting alternatives (Burstein, 1975a, 1978b; Burstein and Smith, 
1977; Glendening, 1976; Haney, 1974). 

Concerns about units of analysis in the evaluation of Project 
Follow Through exemplify the dilemma. Haney (1974) cites four 
general types of considerations: (1) the purpose of the evaluation 
(questions to be addressed), (2) the evaluation design (nature of 
treatments, independence of units and treatment effects, appro- 
priate size), (3) statistical considerations (reliability of measures, 
degrees of freedom, analysis techniques), and (4) practical con- 
siderations (missing data, policy research, multiple year comparisons, 
economy). Haney was unable to choose among units because the 
purpose of the evaluation dictated the child as the unit but the unit 
of treatment was the classroom; moreoever, the multiyear character 
of Follow Through made classrooms impractical as units of analysis. 

Cronbach (1976:1.3a-1.19) also considers the units of analysis 
problem as presented by the Cline et al. (1974) report on Follow 
Through. Cline et al. had analyzed the data at the individual, class, 
and school levels but emphasized the school level in their report. 
Cronbach (1976:1.5) supports this emphasis on the grounds that 
treatments were assigned to schools and program delivery probably 
varied from school to school. Thus, Cronbach’s choice puts him at 
odds with Haney (1974), although they both recognize the prob- 
lems in arriving at a choice. 

Thinking of the analysis of multilevel data as a problem in the 
choice of a unit of analysis is not a very penetrating perception. 
Phenomena of importance occur at all levels and need to be de- 
scribed and subjected to inference making (Burstein, 1978a; Burstein 
and Linn, 1976; Burstein, Linn, and Capell, 1978; Cronbach, 1976). 
Haney’s arguments on this issue are succinct and to the point: 


Investigators ought to have a strong bias for studying various properties 
of the educational system at the level at which they occurs; . . . varia- 
tion in attributes of interest ought to be studied at those levels (or be- 
tween those units) at which it does (or is expected to) occur. . . . If the 
hypotheses are explicitly stated in terms of mathematical models, the 
impact of shifting levels of analysis from one unit of analysis to another 
will be much more easily assessed than if they are not. (1974:96-97) 


Given the above, an emphasis on the choice of a substantive 
analytical model rather than on making a choice among competing 
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units of analysis is the more reasonable and tenable position to adopt 
if an investigator's goal is to learn something about the question one 
set out to answer. Moreover, the multilevel character of educational 
data requires analytical models suited to the identification of educa- 
tional effects at and within each level of the educational system. 
Thus, it is arguable that analyses of educational data should be con- 
` ducted at more than a single level and that when questions about the 
outcomes of groups of individuals are to be addressed, measures in 
addition to group means require attention. These points are con- 
Sidered in the remainder of the chapter. 


Different Questions Addressed and Different 

Variables Measured at Different Levels 

Results from analyses at the group and individual levels often con- 
flict because the analyses bear on different substantive questions 
(Cronbach, 1976: 1.9; see also Scheuch, 1966, and the papers in 
Dogan and Rokkan, 1969). The decision about which questions the 
Investigator wants to address remains to be made. 


Examples of Group Level Questions. While we have argued that 
models with individual level outcomes should be the primary method 
for educational effects research, we can envision situations in which 
the policy questions of interest warrant a different choice. The Cali- 
fornia Assessment program and the issue of school finance reform in 
California provide two illustrative cases. One purpose of the Cali- 
fornia Assessment Program is to inform school districts about the 
adequacy of the achievement of the district's schools. The testing 
Program can also provide data to help districts single out “unusually” 
Poor achieving schools for special remedial services. Neither of these 
Uses of the data seems to require the specification of pupil level 
educational effects models. 

у The pertinence of school or school district questions to the school 
finance reform issue is based on similar reasoning. Policymakers can 
be expected to ask how the reorganization of educational finance in 
California will equalize resources available to schools and school 
districts and whether any changes in resource allocations influence 
educational outcomes at the school and district level. These are 
clearly questions dealing with educational organizations, and they 
Tequire the analyst to seek answers at the level at which they are 
addressed. Hannan, Freeman, and Meyer (1976) present a related 
example about the influence of administrative intensity on organi- 
Zational effectiveness. 

Р In citing these examples, we have not forgotten our earlier asser- 
lon of the prominent role of individual level data. Each example 
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represents a paradigm shift away from the questions that guided 
Averch et al. (1972) and that dominate educational effects research. 
Nonetheless, the question each example addresses has important 
policy implications. Moreover, manipulation of macrolevel educa- 
tional conditions in response to these questions may alter the pro- 
cesses operating at the microlevel. 


Changes in Variable Meaning. The process of aggregating data 
across group members can change the meaning of variables that 
enter a particular regression model at different levels (Burstein, 
1975a; Burstein, Fischer, and Miller, 1978; Burstein and Miller, 
1978; Cronbach, 1976; Scheuch, 1966, 1969). That is, measured 
variables in studies of educational effects typically serve as indicators 
for some latent construct (ability, background quality, educational 
quality). By aggregating a specific variable over students within 
Schools (classrooms), we can change the latent construct for which 
the variable serves as a proxy. 

An example of the shift in meaning across levels is the variable 
father's occupation. Father's occupation is perhaps the most fre- 
quently used indicator of the home and socioeconomic background 
of a pupil. At the level of the individual, it conveys the parental in- 
vestment (nurturance, environmental support for academic achieve- 
ment) in the child's learning efforts. This investment can directly 
affect the academic motivation of the child by supplementing or 
detracting from in-school learning efforts. In this sense, father's 
occupation and other family background measures represent both a 
base and a supplement for the educational process. 

When aggregated over the pupils that attend a given school, 
measures of family background describe community character- 
istics (wealth, urbanism, etc.). Differences among schools in aggre- 
gate indexes of background can also reflect variation in social policies 
governing the organization, content, and administration of edu- 
cation. These social policies can also determine the way in which the 


distribution of social and economic resources outside the school con- 
strains resource allocation to education. 


More concretely, 
tion affect the type 
the school, and the 
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Within schools (classrooms), a pupil's family background can take 
on a different meaning through its role in the conferral of status 
(Alexander, Cook, and McDill, 1978; Heyns, 1974; see also the dis- 
cussion of frog-pond effects above). The student's relative socio- 
economic background can affect both teacher and peer behaviors 
toward the child. Likewise, relative background can cause the 
Student to respond differently out of a sense of relative advantage or 
disadvantage. 

Recent empirical work with data from the IEA studies (Burstein, 
1977; Burstein, Fischer, and Miller, 1978; Burstein and Miller, 1978) 
illustrates the distinct differences in interpretation of family back- 
ground effects, particularly when one moves from a between school 
to a within school analysis. In this investigation of science achieve- 
ment data, between school and corresponding pooled within school ed- 
ucational effects models for fourteen year olds from the United States 
and Sweden were compared. The effects of family background were 
Substantial, as usual, in the between school analysis of the U. S. data, 
but much smaller in the same analysis of Swedish data (Table 3-8). 
In fact, in Sweden R2petween students Was larger than R?petween Schools; 
Which would not occur in analyses of typical U. S. data. In contrast, 
the effects of family background in the pooled within school analy- 
Ses for United States were substantially smaller than they were in the 
between school analyses. In fact, the within school effects for back- 
ground and ability in the United States were essentially the same as 
the effects found in the within school of Swedish data. 

The results from the between schools analyses were attributed to 
differences between the countries in the social policies governing 
the relationship between pupil backgrounds and school resources— 
the predominance of local control and community determination 
Of resources in the United States versus national control and a 
Policy of uniformity of resources (e.g., curriculum offerings) in 
Sweden. The within school results suggest that the role of pupil 
background and ability in the interpersonal allocations of rewards 
Operates similarly within American and Swedish schools. Within 
School reward mechanisms appear to be resistant to the factors 
8overned by the social policy orientation of the country (e.g., 
Curriculum diversity). 

, Empirical analyses from two national educational systems were 
Juxtaposed to highlight the potential benefits of considering vari- 
ables measured at different levels as reflecting different constructs. 
Whether treating different states or different large school districts 
Ih the same fashion will yield equally striking distinctions across 
levels and across comparison groups within levels remains to be seen. 
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Dependencies among Observations within Groups 

The problem of dependencies (correlations) among observations 
within groups is endemic to research on hierarchically nested school 
data and can be especially critical when intact classrooms are in- 
vestigated. Cronbach and Webb (Cronbach, 1976; Cronbach and 
Webb, 1975; Webb, 1977) have argued that when intact groups are 
assigned to instructional treatments, the students in those treatments 
cannot be considered as independent units. Therefore, the typical 
analyses based on all individuals pooled across groups can be justi- 
fiably criticized. 

There are basically two issues associated with dependencies among 
Observations—the effects of dependence on standard statistical 
analyses and the substantive interpretation of dependence as in- 
formation on educational processes within groups. In the statistical 
literature, robustness to within group dependence is of primary 
Interest; the dependence is considered to be a nuisance. For educa- 
tional data, however, dependence may provide important substantive 
information. Different kinds and different measures of within group 
dependence are discussed below. 


Dependence and Statistical Analysis. Educational treatments are 
not administered independently to individuals; individuals within 
the classroom have shared experiences. This dependence among 
individuals within the group can be expressed by an intraclass corre- 
lation structure. The consequences of ignoring this intraclass struc- 
ture (те. treating individuals as independent by ignoring group 
membership) are serious (Walsh, 1947; Weibull, 1953). 

A thorough discussion of the problem of dependency among ob- 
Servations in the experimental design frame of reference is provided 
by Glendening (1976). Glendening simulated the effects of violating 
the assumption of independence within the context of a balanced, 
two level hierarchically nested design, with subjects nested within 
Classrooms and classrooms nested within treatments. She found that 
а model with the pupil as the unit or a conditional model where a 
Preliminary test of independence is followed by a choice of the 
appropriate unit of analysis for testing treatment effects yielded 
Spuriously small error terms and, therefore, led to liberal tests of 
treatment effects. Glendening concluded that the researcher must 
choose a priori between the class (dependence) or student (inde- 
Pendence) as the unit, but acknowledged the complications of ob- 
taining prior knowledge about independence of response. 

The statistical problem considered by Glendening also arises in 
the regression context. The kinds of dependencies considered so far 
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are reflected in correlations among the residuals within groups. Ob- 
viously, such dependencies have serious consequences for the robust- 
ness of statistical tests of educational effects. 


Dependence as Information on Educational Processes. Undoubt- 
edly, research on must educational phenomena will involve dependent 
observations. But dependency is not an all or none phenomenon, it 
is a matter of degree. It is a function of what is being measured (the 
outcomes) and the “treatments” or “causes” under study. It is also 
a function of the composition of the groups and the nature of the 
grouping mechanism (Webb, 1977; also see below). 

Rather than assume complete dependency or independency, it 
seems more reasonable to acknowledge that dependencies exist, but 
dependency may vary across groups or even among pairs of persons 
within groups. This line of reasoning shifts the focus to efforts to 
measure within group dependency directly. In this way, the investi- 
gator can adjust standard analyses of treatment effects for actual 
dependencies among observations and/or assess the antecedents of 
within group dependencies and the direct effects of dependencies on 
educational outcomes. In the latter sense, the variation of depend- 


ency across groups characterizes within group processes in a con- 


venient metric that can in turn be used to explain educational 
performance. 


We can approach the question of the Source and consequences of 
within group dependency from both theoretical and empirical per- 
Spectives, focusing on the class as the group of interest. At a theoreti- 
cal level, it is important to distinguish between “additive” effects and 
"proportional" effects (Glendening, 1976). Additive effects are 
shifts in the level of performance. Thus, between teacher differences 
(within the same treatment condition) are additive effects that will 
be assessed as intraclass relationships using standard methods of 
estimation. However, each child could be taught independently of 
all others taught by the same teacher. In this case, the estimated 
intraclass relationship is not a result of interpersonal relationships; 
instead, it is only an indicator of differing levels of teacher effec- 
tiveness. 

Proportional effects are chan 
(class), making it either more o 
collection of independent uni 


À > ts. Proportional effects may result 
from interpersonal interactions. That is, if students were taught 


independently, they should not tend to increase or decrease in 
variability except when an additive effect creates a pseudopropor- 
tional effect (e.g., by raising one group to a ceiling level on the out- 
come measure). 


8es in the variation within a group 
r less heterogeneous than a random 
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In concrete terms, the distinctions between additive and propor- 
tional within class dependencies revolve around the uniformity or 
variability of the effects. If Teacher A taught the entire class how to 
Work a specific type of algebra problem that no student would have 
been able to answer otherwise, an additive effect for Teacher A has 
been induced—that is, every student's score is incremented by the 
same amount. If, instead, some of Teacher A's students had bene- 
fited from being taught but others had not (due to inattention, lack 
of ability to comprehend the instruction, or whatever), there is a 
proportional effect from being taught by Teacher A. Students were 
affected differently according to their attention or ability. 

Unfortunately, the classical method for estimation of the de- 
pendency among observations within groups does not distinguish 
between additive and proportional effects. Either could be the source 
of the observed intraclass correlation fz. Efforts to assess directly 
the dependency among observations within each class would be of 
value. 

Present methods of characterizing the homogeneity of instruction 
Within classrooms may be insensitive to important educational ef- 
fects. To portray a class as largely individualized or teacher directed 
does not convey how classroom structuring and interpersonal dynam- 
les (peer relations, teacher praise, group cohesiveness) interact to 
form the eudcational treatment received by each student. Better 
measures of similarity (or dissimilarity) of instruction across students 
Within classrooms need to be developed. Perhaps a measure of within 
class dependency can capture certain aspects of this element of 
educational effects. 

When carefully measured, within class dependency indexes are 
Surely more proximal descriptors of classroom processes than are 
designations such as individualized, open, or traditional. The British 
Study of open education by Bennett (1976) would have certainly 
been less provocative and more informative if it had approached 
the differences in instructional styles from this perspective. 

There is no general method for estimating within group depend- 
епсу in a naturalistic study. Even the most careful data collection 
efforts in actual classrooms barely scratch the surface as far as the 
question of degree of dependency in instruction is concerned. A 
logical next step might be exploratory studies with empirical data 
like that collected by the Far West Laboratory (FWL) in the Be- 
sinning Teacher Evaluation Study (BTES) (Fisher et al., 1978). 

The FWL collected extensive and detailed information on a 
Sample of pupils and their learning conditions and activities from 
Second and fifth grade classrooms. The data include daily teacher 
logs; allocations of time in specific content areas in reading and 
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mathematics for each sample pupil; observational data on instruc- 
tional topics, difficulty of material, instructional settings, and 
learning and instructor moves for the same sample of children for 
whom the teachers kept logs. Over the school year, there were ap- 
proximately twenty-two days of observation per classroom. 

The FWL reports on instructional time allocations and teacher be- 
haviors (cf., e.g., Dishaw, 1977; Filby and Fisher, 1977) reflect both 
diversity of practices across classrooms and variation in the homo- 
geneity of practices within classrooms. For example, in fifth grade 
math instruction, there were classes with high, but uniform average 
daily time allocated to mathematics (e.g., a mean of sixty-four 
minutes with a standard deviation of two, classes with low but uni- 
form allocations (e.g., twenty minutes with a standard deviation of 
two), and classes with allocations that varied across children (e.g., 
averages of thirty-three and forty-seven minutes with standard 
deviations of thirteen minutes). Within Specific content areas, be- 
tween class and within class fluctuations in allocated time were also 
dramatic—from classes with an average of one minute per day on 
fractions (s.d. — 1) to a class with twenty-three minutes per day 
(s.d. — 1) to yet another class within twenty-nine minutes per day 
(s.d. = 221). The percentage of total time in small group and whole 
class settings varied from 9 to 51 percent for these classes, while the 
percentage of total time devoted to substantive teacher behaviors 
(explanations, academic monitoring, and feedback) varied from 7 
to 33 percent. As a group, the types of indicators described from the 
BTES study and the variation in practices exhibited by the data 
suggest that this data source is a fruitful point of departure for in- 
vestigating within class dependencies. 


In terms of the consequences of dependencies, there are potential 
benefits from experimental studies of the effe 
and in individual settings. Webb (1977) compared learning in inter- 
acting groups and learning singly, attempting to explain differences 
as a function of the characteristics of the individual, the group, and 
the group process. Her group process results provided a key to under- 
standing why some students learned best in interacting groups, 
whereas others did best learning singly. In general, group members 
who actively participated in discussions did better than those who 
did not, and they did at least as well as after individual learning. 
Whether a pupil actively participated was related to the pupil’s 
ability ranking within the group and to the range of ability in the 


group. Knowing the abilities of the students in a group, one could 
predict fairly well who interacted with whom and, consequently, 
who did best. 


cts of learning in group 
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The results of this highly structured study suggest that knowledge 
of group processes in a particular class is crucial for understanding 
the degree to which students are working together—and therefore 
crucial for estimating degree of dependence in the class. Webb (1977) 
suggests that her procedures may be generalized to real teacher 
taught classrooms, considering the interactions between teachers and 
Students, interactions among students, and characteristics of class- 
mates (abilities, personality variables) and teachers. In the long run, 
one hopes to be able to predict student performance from a com- 
bination of these variables. 


CHOICE OF AN APPROPIRATE 
ANALYTICAL MODEL 


In the preceding sections, a case has been made that consideration of 
the multilevel properties of educational data is both a central and a 
complex part of the specification of educational effects. To be sure, 
the complications due to cross-level inference, the identification of 
group effects on microlevel outcomes, and the determination of 
appropriate units of analysis are important question. But each 
Should be simply a part of a more fundamental activity—the develop- 
ment of an adequate theory of educational processes and the deter- 
mination of analytical methods for identifying the effects of such 
Processes. 

Thus, as with most social science inquiry, appropriate analyses of 
multilevel educational data depend on the investigator's ability to 
(1) identify the questions of interest, (2) elaborate a theory that 
can provide evidence about these questions, and (3) determine ana- 
lytical methods suitable for investigating the theory. In the re- 
mainder of this chapter, we focus on theoretical consideratinos in 
determining suitable analytical methods for multilevel educational 
data and discuss briefly several approaches that may be useful. 


Factors Influencing Choice of Analytical Model 

Although there are numerous technical considerations involved in 
he specification of educational effects, the choice among analytical 
models need not be a methodological decision. In the final analysis, 
the substantive questions being addressed determine the choice of 
à model. The types of processes and the types of outcomes under 
investigation, as well as the type of study, enter into the decision 
Process, 

For example, what may be a reasonable analytical model for the 
effects of instruction or of a program on a short-range outcome 
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(e.g., an achievement test at the end of an instructional unit) loses its 
salience when generalizations, especially time-dependent ones, are 
desired (Burstein, 1978a; Wittrock in Wittrock and Wiley, 1970). 
Nevertheless, instructional psychologists interested in the impact of 
a well-circumscribed instructional unit should have little difficulty 
selecting a model. 

On the other hand, the sociologist studying school effects, the 
evaluator investigating program effectiveness, or the stratification 
theorist focusing on the educational determinants of socioeconomic 
achievement deal with a more complex educational history. This 
history cannot be simply described by instruction in a single class- 
room or, for that matter, in a single school at a given time. More- 
over, when the phenomena under study are dynamic rather than 
static, the required exercise in model specification is much more 
difficult. 

Another distinctly substantive constraint on the specification of 
analytical models can be found in comparing educational effects re- 
search in elementary schools with similar research in secondary 
Schools. In the elementary grades, instruction in a single classroom 
with a specific teacher and a complement of classmates is the norm. 
Moreover, at least in the lower grades, effects of such factors as 
relative status and peer group aspiration are weak if they exist at 
all. Most of the effects of schooling on the educational performance 
of grade school children are mediated by the teacher. The degree 
to which the teacher organizes an instructional program to meet 
student needs and to motivate participation determines the edu- 
cational performance of the student over and above the student's 
entering levels of ability and motivation. 

The forces influencing individual educational performance are 
different in secondary schools. Students receive instruction from 
several teachers. Thus, performance in one subject (e.g., physics) may 
be influenced by the quality of instruction in another (e.g., mathe- 
matics). Curriculum diversity (the curricular options available to the 
Student) also enters the picture. For example, public-speaking skills 
cannot be affected by Speech classes if such classes are not offered. 
Peer group influence, extracurricular activities, and school spirit 
might also affect the performance of high school sti 


udents. 
Clearly, models for analyzing educational effects at the elementary 


level need to emphasize accurate assessment of the role of the class- 
room and the teacher and to be sensitive to the dependencies associa- 
ted with instruction in intact groups. Classroom level and within 
classroom analyses are likely to be required. In contrast, educational 
effects models at the secondary level would perhaps include the 
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instructional investments from several courses, measures of curricu- 
lum diversity, peer influences, and indexes of school ‘“‘atmosphere.” 
The dependencies among students associated with specific classes 
and teachers are probably weak, but school level dependencies may 
be strong. Students take different sequences of courses during the 
day, but are constrained by the available offerings and overall cli- 
mate. School level and pooled within school analyses deserve to 
be emphasized, although for certain types of outcomes, class level 
and pooled within class analyses should be conducted. 

An endless number of additional examples can be provided, but 
the point is clear. The problems associated with the analysis of 
multilevel data vary according to the type of study, the types of 
Outcomes, and the types of processes under investigation. More- 
Over, much remains to be learned about choosing the appropriate 
analytical models in some areas (e.g., nonexperimental longitudinal 
Studies of teacher effects). 


Decomposition of Educational Effects 

It was asserted above that analyses of multilevel data should be 
conducted at more than a single level. Also, in the substantive dis- 
cussion of contextual and frog-pond effects, we described processes 
that may affect group level outcomes, within group outcomes, or 
both. In this section,’ a general discussion of the decomposition of 
educational effects into those between groups and those within 
groups is presented, and several approaches to analyzing data in this 
fashion are described briefly. To simplify matters, only a two level 
(pupils and classes) model is considered, and all equations are ex- 
Pressed in terms of population parameters. Perfect measurement of 
all variables is also assumed. 

, Once membership in a specific class is acknowledged (i.e., instruc- 
tion from a specific teacher), any measure that varies over pupils can 
be decomposed into its between class (teacher) and within class 
(teacher) components. That is, the posttest or outcome performance, 

ij, Of pupil j in class i (j = 1, . . . , n persons per class; i = 1,..., K 
classes; for simplicity, we assume equal-sized classes) can be de- 
composed into 


Yy = ay + Wr ну) + (у-ну). 
individual grand between class within class effect 
outcome mean effect for class i per person ij 


(3.19) 
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If, in addition, we consider the performance level, X;;, of the pupil 
prior to entering the class (i.e., the pretest or some measure of 
entering ability on the same scale as Y), the relation of X;; to Y;; is 
given by 


Yy=uy + В(Ху — их) + Єзї, 
where f, is the coefficient from the between student (total) regres- 
sion of У; on Х;. 
The regression of Y;; on X;; can also be decomposed into between 


class and within class components. Following Cronbach (1976: 
3.1-3.11), this decomposition сап be written as 


Үй = Uy + Bs(ux;— ux) predicted between class 
+ (Uy; — uy) — Bo (ux; = их) adjusted between class 
effect 
+ By (Xi; — ux) pooled within class 
+ (Bi — Bu) (Ху — ux) specific within class 
+ єз. 


specific residual associated 
with person ij 
(3.20) 


In the above equation, 8, is the between class slope from the regres- 
sion of uy; on HXi» Bw is the pooled within class slope from the re- 
gression of У — uy; on Xj; — их, across all classrooms, and 6; is the 
specific within class slope from the regression of У, on X;; within 
the it classroom. i: i 

Each of the four components of the decomposition reflect the in- 
fluence of educational processes (between classes, within classes, Or 
both). Possible substantive interpretations of each component are 
offered below. 


Between Class Slopes. The magnitude of 6,, the between class 
slope, provides an indication of the extent to which initial differ- 
ences in the mean performance levels in a set of classrooms are main- 
tained, exaggerated, or reduced at a later measurement occasion. If 
positive, as it typically would be, By reflects the tendency for classes 
with highest mean inputs to have high mean out 


PN ONE comes. A low value 
of By implies little dependence of class mean outcomes on average 
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initial performance. A weak influence of entering characteristics can 
be the result of a “compensatory effect," whereby instructional re- 
Sources are allocated between classes in an equalizing or redistribu- 
tive manner. 


Adjusted Between Class Effects. The adjusted between class ef- 
fects reflect the impact of the specific teacher or class on the mean 
outcome of its pupils after controlling for the effects of mean inputs 
on mean outcomes. Included in adjusted class effects are the main 
effects of the teacher, plus such things as unusual group cohesiveness 
Or disharmony, high or low quality program delivery (e.g., well- 
Structured, individualized instruction), and average error of measure- 
ment of У, for the class. 

In any event, large positive adjusted classroom effects may be an 
indication of exemplary educational practices. Whatever analysis 
Strategy is employed should generate accurate estimates of this 
teacher-class effect on mean outcome and perhaps should identify 
generalizable characteristics of teachers or classes achieving large 
effects, 


Pooled Within Class Slopes. The pooled or common within class 
Slope, 8, , reflects the tendency across all classes for students above 
the class average on input to do better or worse than the rest of the 
class on the outcome measure. The interpretation of 8, parallels that 
Of By; the former refers to the consistent tendencies in within class 
Processes and the latter deals with corresponding tendencies for class 
averages, 

Estimation of the common within class regression is of value in the 
Study of teacher-class effects. The pooled within class slope, bw, 
Provides an indication of the overall redistributive properties of class- 
room instruction for the school population represented in the study. 
If policymakers and educators are intent on raising the relative per- 

Ormance of youth of low ability and low socioeconomic status 
through Specific instructional practices, w reflects the magnitude of 
the problem they will encounter. 


Specific Within Class Slopes. Differences among educational pro- 
8rams in their pooled within class slopes can be given substantive 
interpretations and can be the basis for the policy decision. The 
Interpretation of slope differences proffered here falls within the 

Omain of aptitutde-treatment interaction (ATI) research (Cronbach, 
1976; Cronbach and Webb, 1975; Snow, 1976). The logic of ATI 
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research is built on the substantive significance of differences in 
within treatment slopes. : 

We сап carry the ATI logic one step farther to the level of the in- 
dividual classroom. The within class regression of pupil outcome on 
pupil inputs (denoted by 8;) is likely to vary across classrooms. 
Cronbach (1976:316) cites as sources of the variation in f: (1) 
sampling variability due to chance and stability problems due to 
small class sizes when the processes operating in the classes are the 
same, (2) differences in the selection factors forming the classes, and 
(3) differences in causal processes going on in the classrooms. This 
last source encompasses the possibility that some teachers or classes 
have relatively greater compensatory (redistributive) effects than 
others. 

Variation in the 8; is a potentially important source of informa- 
tion for researchers and policymakers, especially when such in- 
formation is combined with the adjusted between class effects 
discussed earlier. Specific within class slopes gain salience in educa- 
tional research primarily because the 6; are class level indexes that 
may reflect actual within class processes. The isolation of classroom 
process variables that are associated with the magnitude of 6; could 
have considerable policy implications. In a later section on alterna- 
tive measures of group outcomes, we consider heterogeneous within 
group slopes along with other indexes of group outcomes and elab- 


orate further our view of how heterogeneous slopes can arise in 
practice. 


Selected Analytical Strategies for Multilevel Data 

The above discussion highlights the complexities in specifying ed- 
ucational effects in multilevel educational data. The investigator who 
conducts only a between group, between student (total), or pooled 
within group analysis will fail to identify (or will misspecify) certain 
educational effects—assuming all four types exist. 

Several investigators (Burstein and Linn, 1976; Burstein, Linn, 
and Capell, 1978; Cronbach and Webb, 1975; Hannan and Young, 
1976b; Keesling, 1976; Keesling and Wiley, 1974; Young and Er- 
bring, 1976) have suggested approaches for analyzing multilvel data 
that identify combinations of the components from the decomposi- 
tion. Some approaches are heuristic or pragmatic, while others are 
based on strong theories about the educative process. Several of the 
approaches (Burstein and Linn, 1976; Hannon and Young, 1976b; 
Keesling, 1976; Young and Erbring, 1976) were first presented at an 
NIE conference, “Methodology for Aggregating Data in Educational 
Research." Unfortunately, little is known about these approaches, 
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because only limited additional analytical and empirical work has 
been conducted. Each approach is briefly described below. 


Between Group, Pooled Within Group Analysis (Cronbach, 1976; 
Cronbach and Webb, 1975). Cronbach (1976:10.3 ff.) asserts that 
the usual overall between student analysis combines two kinds of 
relationships—those operating between collectives and those opera- 
ting among persons within collectives—into a composite that is 
rarely of substantive interest. In fact, as noted by Cronbach, #,, the 
Overall between student regression, has been shown by Duncan, 
Cuzzort, and Duncan (1961) to be a composite of the between group 
regression, B,, and the pooled within group regression, w. There- 
fore, Cronbach (1976; Cronbach and Webb, 1975) recommends that 
the between group effects and individuals within group effects be ex- 
amined separately. 

3 With a single class level measure, T, a single measure of student 
input, X, and a single student level outcome measure, Y, Cronbach's 
Strategy would require the inspection of the following two regression 
equations: 
y, = пу + Bg gsx; ux) + бүт. (Tim ur) + oui (8.21) 
Yi; = uy, + Bw (Ху Их) + €ai- (3.22) 
In the above, B= =, B= „. =, and 8, parallel or equal three of 
"РҮ хәт НҮ ТК ш 

the components in the decomposition (the between class slope, the 
adjusted class effect, and the pooled within class slope). Cronbach's 
report does consider specific within group regressions (1976:3.2 ff., 
5.5 ff., 8.1 ff., and elsewhere), but mostly indirectly. Cronbach con- 
cludes (1976:5.5 ff.) that in Anderson’s ATI study, the specific 
Within class regressions do not account for much variance (4.1 per- 
Cent in the “drill” condition and 6.9 percent for the “meaning” 
Condition). Yet, the portions of outcome variance attributable to 
adjusted class effects are also small (5 percent for drill and 6.9 

Percent for meaningful). 

Thus, although Cronbach emphasizes the consideration of be- 
tween group and pooled within group effects, his report clearly 
Tecognizes the significance of heterogeneous within group regres- 
Slons. At the same time, he offers no strategy for a systematic 
€Xamination of the determinants of differences in within class 
Slopes. In the studies he considered, differences in within class 
Slopes that appeared to be significant were generally traceable to 
anomalous students (e.g., Cronbach, 1976:5.21-22). 
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Regression Models for Hierarchical Data (Keesling and Wiley, 
1974; Wiley, 1976). Keesling and Wiley (1974) proposed a model 
for disentangling the effects of variables defined solely at the school 
Jevel from those defined at the pupil level. (We can talk about classes 
rather than schools without loss of generality.) The process of dis- 
entanglement involved two states: (1) the adjustment of the effects 
of individual background characteristics on outcome for the effects 
of the classrooms in which the individuals receive instruction and (2) 
the adjustment of the effects of classroom characteristics for those 
of the individuals. The adjusted effect estimates and other by- 
products of the adjustment process are relevant to our interest in the 
components of the effects on pupil outcomes. 

In the Keesling-Wiley model, the adjusted effects of pupil level 
variables are found by controlling for all outcome-relevant class 
variables and for any interactions between class and student vari- 
ables. If no interactions exist, this adjustment leads to estimates of 
pooled within class slopes, 8, . Keesling and Wiley's only comment 
on the effect of significant heterogeneity of regression is to indicate 
that such a condition would preclude using the same adjustment 
algorithm for each class. 

The second stage of the Keesling and Wiley model uses the ad- 
justed effects of individual level variables, aggregated over pupils 
within classes, to determine the adjusted effects of class level var- 
iables. In practice, they apply equation (3.22) to obtain individual 
outcome scores—that is, 


Yy = uv, + Bw (Xij — их). 


Then the predicted mean outcome for each class based on the 
pooled within class slope is determined: 


Ay, = нү + Bu (ux; — их). 


Finally, a model is fitted at the class level regressing the observed 


mean outcome for each class on class level explanatory variables and 
the predicted mean outcomes for each class: 


Hy; = Yo + Yr Ti Му, + ф, (3.23) 


where ут is the adjusted effect of the class level variable T, and ^ 
allows partial removal of additional Specification bias due to omis- 
sion of class level variables that are correlated with the sum of the 
average individual level effect values represented in 


Йу; If all relevant 
class level variables are included, \ = 1. 
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In this model, ут is analogous to the adjusted teacher-classroom 
effects from a between class analysis in the style of Cronbach. The 
key distinction between y in equation (3.23) and Ву у. у in the be- 
tween class analyses of means from Cronbach is that the means of 
individual level explanatory variables (the их,) have been multiplied 
by the constant f, , the estimated pooled within class slope. Since а 
linear transformation of the их,5 does not affect their relationship 
to other explanatory variables, for a single class level variable, 


Yr =бүт.у- 


Thus the Keesling-Wiley analysis provides the same information 
about adjusted teacher-class effects as Cronbach's does. 

The information conveyed in the estimates of А may be im- 
portant. It can be shown that for a single class level explanatory 
variable, T, and a single individual level variable, X, 


Boxer 


Bo í 


But, if it is assumed that within class slopes are homogeneous and 
that no other class level variables affect the average individual level 
explanatory variables (the ux;), Х = 1. This implies that for properly 
Specified models, 


By» 


The implication of the last step is that there is no class level relation- 
Ship of My, to ux, beyond that conveyed by the pooled within class 
Slope. That is, there are no class level effects of aggregate explanatory 
Variables. The above .also implies that departures of \ from one in- 
dicate the presence of specification error such as the exclusion of a 
relevant individual level variable or perhaps the presence of hetero- 
Seneity of within class slopes. 
mpirical work with artificial data (Burstein, Linn, and Capell, 
1978) indicates that the value of А is influenced by systematic re- 
lationships between within class slopes, 6;, and teacher-class effects, 
i- If these results hold up under more careful scrutiny, the Keesling- 
Пеу Strategy deserves much more attention that it has received 
us far, 


Slopes at Outcomes (Burstein, 1976a; Burstein and Linn, 1976; 
Urstein, Linn, and Capell, 1978). In contrast to Cronbach and to 
€esling and Wiley, the examination of slopes as outcomes (Burstein, 
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19762; Burstein and Linn, 1976; Burstein, Linn, and Capell, 1978) 
emphasizes the specific within class slopes. The analysis of the 
pooled within slope represents the fall-back position in case the 
analysis of the 8; produces no interpretable antecedents of slope 
heterogeneity. 

The steps in the slopes as outcomes analysis are: 


1. For each class, find 


Yi; = uy, + В.(Ху — ихр + eai 


2. Fit a model at the class level with иу; (see equation [3.21] and 8; 


as outcomes and class level (or higher level) explanatory variables 
as independent variables. The equation for В: is given by 


B; — 05 + Op T; + Ox X + cog. (3.24) 


The adjusted teacher-class effects in the slopes as outcomes analy- 
sis are 8>.x% in equation (3.21) and дт in equation (3.24). Thus, 
the adjusted teacher-class effects on class means are the same as in 
the Cronbach and the Keesling and Wiley analyses. The regression of 
the within class slopes on teacher-class quality measures presumably 
identifies antecedents of heterogeneity of within class regressions. 
That is, Өт may reflect Systematic effects of teacher-class charac- 
teristics on slopes. This interpretation of 6p is directly relevant to an 


elaboration of the components of teacher- 


class effects on pupil 
outcomes. 


Pooled Cross-Sections and Time-Series Analysis (Hannan and 


Young, 1976b). An analytical strategy that has some theoretical 
appeal for dealing with the problems of dependencies among ob- 
servations within groups can be found in the econometric litera- 
ture on pooling cross-sections and time series. Hannan and Young 
(1976b) discuss the similarities in the problems that arise in analysis 
of multiwave panel data and multilevel cross-sectional data. In par- 
ticular, they argue that the application of the logic developed for 
multiwave panels (Nerlove, 1971; Maddala, 1971) to multilevel 
analysis problems would clarify issues in current work, 

In practice, Hannan and Young (1976b) propose the applica- 
tion of generalized least-squares (GLS) estimation procedures to 
measure the intraclass correlation among pupil disturbances within 
classrooms. Their GLS procedures amount to a scheme for weighting 
pooled within group and between group estimates to provide 
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efficient estimation of the desired parameters in the context of intra- 
class correlation among persons within groups. 

Hannan and Young present neither simulated or empirical data to 
ilustrate their approach in the context of the multilevel analysis 
problem. Furthermore, their presentation assumes constant p (i.e., 
constant dependency among persons within groups and the same 
magnitude for all groups). The latter problem may not be serious, 
Since with a priori rules for partitioning the variance-covariance 
matrix of disturbances, p can be varied across classrooms to reflect 
Possible variation in intraclass correlations for different configura- 
tions (e.g., individualized versus whole group instruction). Obviously, 
further investigations of the Hannan-Young approach are needed 
үш conditions of presumed constant or variable intraclass corre- 
ations. 


Components of Covariance Analysis (Keesling 1976, 1978). 
Keesling (1976), 1978) pointed out that the observed variances and 
Covariances used in the typical between group analysis (e.g., equation 
[3.21] ) are generally combinations of both between group and with- 
10 group components of variance and of covariance. His reasonging is 
based on the analogies between standard analysis of variance esti- 
mates of treatment effects and the estimation of group effects in 
hierarchically nested school data. 

Keesling starts from the multivariate random effects model. 


Yy-utbitei 


(ел ‚ k groups, j = 1,...,n persons within each group of 
equal size). The covariance matrix for the У; can be written as 


Уу = Dy % 2, (3.25) 
Where £Z,,Z,,and X are the population variance-covariance matrixes 
from the between student, between group, and within group data. 


~ 5 апа S, are the sample variance-covariance matrixes, maximum 
ikelihood estimators of X and X, are given by: 


i-1l-s (3.26) 


and 


RUE HM Ба 3.2 
25 = 5 n9 —2) (3.27) 
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Keesling argues that there may be some interest in specifying dif- 
ferent relational models for the between components and the within 
components of covariance matrix. He also describes analytical tech- 
niques from Schmidt (1969) for obtaining separate estimates of the 
two matrixes. 

Of value in their own right, the ideas suggested by Keesling are all 
the more interesting because of the possibilities of linking work on 
the analysis of multilevel data to the developments in the analysis of 
covariance structures (Jóreskog, 1970). Keesling's later paper (1978) 
provides an example of how such a linkage might proceed. 


Multilevel Feedback Model of Contextual (Classroom) Effects 
(Young and Erbring, 1976). The primary contribution by Young 
and Erbring to the methodology for analyzing multilevel data is to 
redirect the arguments on the statistical existence of contextual ef- 
fects toward questions about how to Specify more accurately the 
model underlying the group level processes. As a result, Young and 
Erbring expect that spurious interpretations of the statistical results 
on contextual effects will be reduced. 

Young and Erbring assert that even when it is recognized that 
many contextual effects are spurious, some intrinsic group properties 
may still remain. They prefer not to define context in statistical 
terms, but assume that there are real group processes. They suggest 
starting by specifying the process rather than the effect. 

Once specified, the group processes are described by models in- 
corporating endogenous feedback: if performance of one student 
affects the performance of others, notions such as dyadic linkages, 
mutual awareness, and face-to-face interaction become relevant. 
Exogenous variables such as pupil ability and family background 
feed back into the system through classroom interactions. 

According to Young and Erbring, the models required to identify 
the effects of group process are simultaneous equation systems. The 


simultaneous nature of the equations, even using reduced forms, 
poses difficult estimation problems requiring special methods 
(Mitchell, 1969). 


The approaches suggested by Hannan and Young, by Keesling, and 
by Young and Erbring draw heavily on structural equation methods 
adapted from econometrics. In general, such approaches require 
correct specification of models based on strong theories about the 
substantive processes under investigation and on reliable and valid 


measures of the variables in the model. Unfortunately, current re- 
search on the effects of education is relatively primitive when judged 
by structural modeling standards. 


The Role of Levels of Analysis in the Specification of Education Effects 177 


At the same time, however, further work with these approaches is 
warranted. Until we have had an opportunity to delineate further the 
similarities and differences among approaches when applied to large- 
Scale educational investigations, it would be inappropriate to exclude 
any one approach from consideration solely on the basis of the in- 
ability of the educational profession to comprehend it or to put it to 
immediate use. 


Alternative Measures of Group Outcomes 

Above, it was suggested that analyses of group means can lead to 
mistaken impressions about the effects of education on pupil per- 
formance. Analytical approaches were described that, in one form or 
another, were intended to ameloriate the difficulties by taking into 
account the multilevel character of the data. 

At this point we approach the problem from the perspective of 
alternative measures of group outcomes. Once it is determined that 
the questions of interest or the statistical considerations warrant 
analyses of aggregated data, the types of between group effects that 
One expects to find remain to be specified. In particular, when the 
Purpose is to determine factors affecting pupil performance, analyses 
Of between group (class, school, etc.) means can hide important 
differences in the within group distribution of pupil outcomes and 
educational inputs. Different groups can have the same mean per- 
formance yet vary on other moments of the groups' distributions 
(e.g., variances). 

A variety of educational theories lend support to the notion that 
Specific instructional practices can affect the within school and 
Within class distribution of pupil performance. Moreover, the use of 
distributional characteristics in addition to the mean as outcome 
Measures has been demonstrated empirically (Lohnes, 1972; Klit- 
Баага, 1975; Brown and Saks, 1975). In an interesting treatment 

Y Brown and Saks (1975), school district level achievement data 
for fourth graders from the Michigan Educational Assessment are 
examined. Weighted district means and standard deviations are the 
measures of outcome, and separate models are estimated for three 
community types (cities, suburbs, and rural towns). Three district 
Characteristics (average experience of instructional staff, the ratio of 
Pupils to teachers and professional staff, and the percentage of 
teachers with masters degrees) are included in the models, along 
With district level SES and ethnicity measures. Brown and Saks 
found that only three of the nine coefficients for district effects on 
Mean outcomes were significant, whereas six of the coefficients for 
effects on standard deviations of outcomes were significant. The use 
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of the district standard deviations was particularly rewarding when 
looking at suburban areas. 

As discussed above, within group slopes of outcome on input can 
contain important information about substantive educational effects. 
In classroom research, differences in slopes can arise when the alloca- 
tion of instructional resources among pupils varies from class to class. 
The possibility of heterogeneous within class slopes becomes obvious 
from the diversity of instructional practices that are characterized as 
either individualized, compensatory, or traditional. All things being 
equal, we expect that higher ability students make more appropriate 
instructional decisions and learn at a faster rate than lower ability 
students in a class that emphasizes individualized instruction with 
student self-pacing. Regardless of the overall effect on class mean 
performance, individualization could strengthen the relationship be- 
tween entering ability and student outcome, thereby exaggerating 
preexisting differences in pupil skills. 

In contrast, another class might emphasize mastery learning, where- 
in it is expected that most students will master the curriculum 
content. Or, the teachers might compensate for preexisting differ- 
ences by investing extra instructional resources in those students 
with the poorest entering performance. These types of practices may 
or may not raise the mean performance of the class as a whole. It is 
likely, however, that such practices would reduce the relationship 
between entering performance and outcome—that is, such com- 


pensatory practices would lead to flatter wit! 


- hin class regression of 
outcome on input. 


The types of instructional strategies described above do exist in 
schools along with what might be called the more traditional pat- 
tern—the same instructional program being delivered to all students 
in the class, with heavy emphasis on whole class activities (lecture, 
seatwork). While these different types of instructional strategies 
may start with similar instructional resources, they distribute these 
resources to individual students in a variety of ways. Thus, it is 
possible that varying instructional strategies would yield different 


class mean outcomes, and it is also plausible that different within 
class slopes would result. 


The implications of the scenarios presented above for analysis 
of educational effects data are straightforward. Instructional prac- 
tices can be combined with student characteristics and instructional 
resources to affect the mean outcomes of classrooms or a variety of 
other distributional properties of outcomes (Brown and Saks, 1975; 
Burstein, 19762, 1978a; Burstein, Linn, and Capell, 1978; Linn and 
Burstein, 1977; Wiley, 1970). In particular, there may be teacher- 
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class effects on the within class regression of outcome on input 
whether or not there are teacher-class effects on class mean perform- 
ance. If such slope effects exist, the analysis should take them into 
account. 

Linn and I (Burstein and Linn, 1976; Burstein, Linn, and Capell, 
1978) generated hypothetical data to examine the effects of heter- 
ogeneous within class slopes on several analytical models for identi- 
fying educational effects. It was found that heterogeneous within 
class slopes can make important differences in identified educational 
effects when the magnitude of the within class slope is systematically 
related to class-teacher characteristics. The differences were not 
swamped by sampling variability in the estimation of slopes; more- 
Over, certain analytical approaches exhibited good properties even 
in the presence of heterogeneity. 

A better test of the potential benefits of within group slopes as 
outcomes would be empirical evidence that within group slopes are 
related to school and classroom processes. Results of an analysis of 
science achievement data on U. S. fourteen year olds in the IEA 
study (Burstein, 1977, 1978a; Burstein and Miller, 1978) provide 
evidence of the possible payoff from the slope measure. In this 
analysis, within school slopes of science achievement on a verbal 
ability measure (assessed concurrently) were significantly and posi- 
tively related to school mean responses of pupils on indexes of ex- 
posure to science study and of the degree to which pupils reported 
Instructional practices that emphasized exploration (discovery 
methods of instruction; see Table 3-9.). 

The steeper within school slopes with greater opportunities for 
exposure to instruction and with greater emphasis on individual 
exploration are consistent with expectations from other research 
(Bennett, 1976; Sórenson and Hallinan, 1977; Stebbens et al., 1977). 
They also suggest the need for more fine-grained consideration of 
Slopes as outcomes and for similar investigations with other data sets. 


SUMMARY COMMENTS 


I have attempted to discuss the major issues about the role of levels 
of analysis in the specification of educational effects. The key points 
that either guided or grew out of the study can be summarized as 
follows: 


1. Investigations of educational effects are inherently multilevel. 
That is, education involves students taught by teachers in class- 


lysis 


180 Issues in Microanal, 


"61,61 “VWN pue ure3sing :oo1nog 

"10119 piepuejs S}! 92LA3 Spoooxo 1u9121jj907) 
* 
“sisoyjuared ur are 01351315 7 993. 


"Ајелоддвол ‘gp pue ZG" exe sedo[s [oouos шцзл əy} Jo попемор рлериејѕ pue ueaur 
eu] '(sis&[eue payysiamun s[oouos uəəm4əq хој 00:5 = "p's pue Lp'gg = ивәш) sisÁ[eue рә}цйәл\ e ur 'A[oArjpedsei '/g'p pue 
*LY'6 ‘ТРЕ әле JOSY JO uoneraop piepuejs [00425 UIIMJƏQ IYJ pue *uorraop paepuej?s 3uopnjs иәәмјәд ƏY} ‘ивәш үүеләло эч 


"penrpour uaeq олец AANLSIOS Pu? MMU 1°Ч3 31әохә z-g әде, ur se әшеѕ 991 әле зәүүютлел Su, 


Lg vg LL zu 
(92'5) (pz) (6Т`@) 
866` 90ё` [155 *8L0° «998° »192' чнолахи 
(go's) (081) (esz) 
216 LST EZT *890` sol" «LT AGALSIOS 
(48) (001) (gc 1) 
580'— 160'— IL0 6rr— 699 — $901 "avio 
(st) (19'5) (82'e) 
810'— 863° ELZ 760 — x£L9'T x65L€ WOHXOS 
(70°) (0'1) (026) 
900" огт' 88r 300° 16Т` x69 000404 
(8Т°@) (9Т`@) (cvv) 
e8r— 9,T— 615 — *999— »165'5— #996'9— xas 
(80'Р) (SFT) 2(50'1) 
Orr ror oor *616 PET »196' MMU 
20015 as. ирәрү 28015 ‘ads ирәрү p$91QUI4DA 
pazipappunjsuQ јигригагриј 


pazipppun}s 
аза1домод јигригага 


"($10025 ZOL = A) SPIO 42A ueoj4no4 jo зиәшәләццә\у гои2195 10) sonsuiej2e1eu? бицооцоѕ 
pue рипозбузед jo suea|y |ооцо$ uo sadojg pue ‘ѕиоцегләа piepuejs ‘suea jo suoissasBay jane] JoouoS '6-g ојде| 


The Role of Levels of Analysis in the Specification of Education Effects 181 


rooms in schools in school districts. Therefore, attempts to specify 
educational effects involve analyses of multilevel educational data. 

2.Research on problems of cross-level inference has shown that 
analyses of educational effects at different levels reveal substantial 
differences across levels for specific models. Different variables 
enter models at different levels. Aggregation typically inflates the 
estimated effects of background on outcomes and decreases the 
likelihood of identifying effective teacher-classroom-school char- 
acteristics and practices. 

3. Analyses involving both individual level and group level effects 
(contextual analysis) should be based on careful theory in which 
the source and form of group effects are specifically stated. More- 
over, purported group effects should be measured directly. 

4. Phenomena of importance occur at all levels of the educational 
system, and they need to be described and subjected to inference 
making. Thus, the focus of an investigation of education effects 
should be on properly specifying the substantive analytical 
model(s) rather than on making a choice among competing units 
of analysis. 

5. Choices among analytical strategies for multilevel data are not 
methodological decisions independent of the substantive questions 
being addressed. The choices are to a large degree constrained by 
the type of study that one is conducting and by the types of out- 
comes (e.g., short term-long term; specific-general; cognitive- 
affective) and processes under investigation. 

6. The major measurement issues that affect the specification of ed- 
ucation effects in multilevel data involve the identification of 
adequate measures of outcomes and of the microprocesses in class- 
rooms and schools. With respect to outcomes, measures of group 
Outcomes besides the mean warrant further attention. Moreover, 
the measurement of classroom and school processes needs to be 
more directly associated with the educational experiences of 
individual children. Thus, efforts to achieve better matches be- 
tween the actual educational “treatment” and the process 
Measures associated with each student involve disaggregation of 
data and hence multilevel problems. 


In rereading these summaries, I find myself returning to the con- 
Cerns that opened the chapter. Like any other problem in the specifi- 
cation of educational effects, the role of levels of analysis needs to 
be approached cautiously and tentatively. Caution is necessary be- 
Cause the complexities are great and the natural inclinations of the 
analyst are to search for parsimony and generalizability. 
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The above withstanding, this examination of the role of levels of 
analyses suggests some topics for further investigation. The direct 
measurement of processes within classrooms and schools is one such 
topic. Methods are also needed for incorporating these processes in 
analytical models that focus on the substantive questions guiding the 


research and that yield outcomes that are easily interpretable as con- 
sequences of educational processes. 


NOTES 


1. The discussion of different types of grouping draws heavily from Hannan, 
Nielsen, and Young (1975) and Hannan and Young (19762). 

2. This definition of frog-pond effects is perhaps inconsistent with its original 
use (Davis, 1966; Meyer, 1970). Some authors interpret the coefficient of X;; in 
equation (3.12) as the frog-pond effect and never actually carry out the analyses 
based on equation (3.13). However, the analytical consequences of the choice 
between equations (3.13) and (3.14) are on the estimate of the context effect 
(ie. Byxey x Pyx-(x—x,) rather than on the estimate of the frog-pond 
effect, (i.e, By v. = Dy (x —x'- x ). So the choice is inconsequential for the sub- 
stantive interpretations of frog-pond effects. 


3. This discussion draws heavily from Burstein, Linn, and Capell (1978) and 
related papers. 
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the Elementary Classroom 
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It is difficult to do research in classrooms. Serious prob- 

lems confront the social scientist who would study various 

aspects of educational finance and productivity at the 
Classroom level. Among the problems that may have to be faced 
are those related to using standardized tests as measures of student 
achievement, attributing effects to events in the classroom when 
students’ activities outside of school are not recorded, accurately 
Observing certain teacher and student classroom behaviors, etc. 
A number of the more salient problems for classroom-based research 
arg discussed in the first section of this chapter. Within the usual 
limits of time and fiscal support allocated to such studies, some 
ОЁ these problems may be insoluble. 

Many of the problems noted cast doubt on the profitability 
9f doing traditionally designed large-scale observational studies 
of instructional processes. A more microanalytical approach, using 
Small samples and dense observation, may be more profitable. 
Such Small but in-depth studies are often called clinical studies. 

linical inquiry in the study of classroom decisionmaking, resource 

Ocation, and the use of technology—various aspects of the study 
of finance and productivity—has much to recommend it. A more 
thorough discussion of the clinical attitude and clinical method 
applied to the study of classroom teaching and learning constitutes 


е second part of this chapter. 
PROBLEMS OF CLASSROOM RESEARCH 


Anyone who wishes to study teaching and learning at the class- 
room level must confront and solve (or at least learn to live with) 


a few unique problems. Among these are: 
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1. Problems associated with standardized testing; | 

2. Problems associated with defining the adjunct curriculum; 

3.Problems associated with the effects of the home environment 
on classroom activities; 

4. Problems associated with developing multivariate outcomes; 

5.Problems associated with defining the unit of analysis for the 
independent variable; 

. Problems associated with sequencing instruction; 

.Problems associated with recording the difficulty level of the 

instructional materials; 


. Problems associated with determining rules for access of materials; 
and 


. Problems associated with generalizability, 
a. The stability of teacher behavior, 
b. The stability of student behavior, and 
с. The generalizability of measures of teacher effectiveness. 


AD 


Ке] 


including: 


Problems Associated with Standardized Testing 

In studies of how teachers affect students, standardized achieve- 
ment tests are used extensively as criteria or outcome measures. 
As a group, these tests are highly reliable instruments. They usually 
have adequate curriculum content validity and can predict future 
academic success, However, they do have one overwhelming flaw: 


they simply may not reflect what was taught in any one teacher's 
classroom. The tests are designed to be used in all kinds of courses 
within a curriculum area and therefore cannot be completely sensi- 


tive or appropriate for any one teacher's teaching (Gall, 1973). 
They lack content validity а 


t the classroom level. 
Different philosophies 
about what is important 
with the teacher’s likes 


є 1 » 85 can be seen from the dramatically 
greater allocation of time to those areas in contrast to the mean 
time each student in classes 1,8 
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Table 4-1. Mean Class Time Allocated to Some Reading and Mathematics 
Content Areas in Second and Fifth Grade. 


Class Number 
Content Area Class 5 Class 21 Сиња Сила 
Mathematics (grade 2) 
Word problems 109 220 228 315 
Money 98 E io 100 
Linear measurement 29 130 p^ 399 
Fractions 0 a = is 
Addition and Subtraction, 
no regrouping, short form 835 420 1389 = 
Class 1 Class 3 Class11 Class 25 
Reading (grade 5) 
omprehension, inference, 
Synthesis 235 252 T ma 
dentifying main items 
_ 1 reading 153 2 956 640 
Silent reading 1,083 Ie 664 1415 
Spelling 694 847 664 1,415 
Creative writing 56 343 98 pun 


S я Я 
9urce: Berliner, 1979. Used by permission. 


nal classroom 


These rather significant differences in the functio 
con- 


Curriculum should, by all we know about learning, result in 
Siderable differences in achievement. If students in these second 
&rade classrooms were tested at the end of the year on linear mea- 
surement, you might do well to wager that the students in class 13 
Would demonstrate better performance than the students in class 5. 
hese fifth grade classes were part of some end of year statewide 
ind program where drawing inferences from paragraphs of prose 
i as tested, as it often is, one might well expect that the students 
> Classroom 11 would show superior performance when contrasted 
О Similar students in the other fifth grade classes. Such simple 
VPotheses have been supported in analyses of these data (Fisher 
et al., 1978). 
he broad spectrum, standardized a 
al indicator by which state or nation: 
as long as teachers have the freedom to | 
ney, emphasize and what materials they will use, these tests can 
то er be used as fair measures of teacher effectiveness. At the class- 
int level, it simply is not fair to teachers to evaluate their students 
areas that they did not cover or emphasize. Thus, between teacher 
Mparisons of effectiveness using standardized tests as outcome 
®asures cannot be defended unless natural variation in choice 


chievement test may be a 
al policy can be informed. 
to choose what areas they 


Soci 
But 
will 
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of content and time allocated to content areas of the curriculum 
are experimentally controlled. 


Problems Associated with Defining 

the Adjunct Curriculum 

When studing the teaching and learning of reading and mathe- 
matics, two of the most commonly examined subjects, a good deal 
of adjunct instruction is ignored. In a lengthy examination of a 
second grade classroom (Berliner et al, 1978), it was discovered 
that all work on fractions was introduced at the cooking center. 
A teaching aide presented this information while preparing recipes. 
The teacher has decided not to provide any instruction of fractions 
during the regular mathematics time periods. It was also found 
that a considerable amount of elementary social studies is taught 
by allocating time to the silent reading of relevant material. Some 
of the elementary school science curriculum is also taught this way. 
The teachers consciously use these other curriculum activities to 
build upon their reading programs. To study instructional decisions, 
then, for evidence about the use of classroom technology requires 
observers to be in the classroom for the entire day, over several 
days, observing many different activities in order to learn about 
the adjunct curriculum. Unless this can be done, a good deal of 
this adjunct academic curriculum will not be observed. 


Problems Associated with the Effects of the 

Home Environment on Classroom Activities 

The decisions a teacher makes in classrooms are affected in a 
number of ways by what happens to students outside school. The 
carrying out of homework assignments, the amount of parental 
interest shown, the students' time spent reading comic books and 
newspapers, and so forth all have ramifications for the classroom. 
These often unknown out-of-school activities cause peculiar prob- 
lems for research. For example, the child who reveives extensive 
guidance at home in reading and mathematics, either through help 
with homework or through general parental academic concern, may 
not pay attention to work assignments in school. Such children 
sometimes find school boring, since much school instruction is 
geared for the lower and lower-middle ability child (Lundgren, 
1972). Special decisions about instruction for these children need 
to be made. On the other hand, the child who receives no home 
support as a backup may be getting all his instruction in reading 
and math in the classroom. If the classroom is not one where engaged 
time is high, the entire elementary curriculum for a child may total 
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well under one hundred hours per school year of engaged time 
in academic pursuits (Berliner, 1979). If so, special instructional 
support systems will be needed. In either case, what happens at 
home does affect the decisionmaking and instructional activities 
of teachers in classes. Not to know what goes on in the homes of 
children is to miss an important part of the forces that shape class- 
room reality and to which teachers must respond. The teacher's 
decisions about who will do drill work, who will play mathematics 
games, and who will help tutor others are affected by the teacher's 
perception of out-of-school family support for children's learning 
activities. Thus, classroom resource allocation decisions are some- 
times made on the basis of a teacher's perceptions of a student's 
life at home. The accuracy of these perceptions and the appro- 
priateness of the decisions are often completely unknown. 


Problems Associated with Developing 


Multivariate Outcomes 7 : 
There are at least two dependent variables in any instructional 


activity that should be of interest. One of these is the achievement 
of the learner in the situation. This has been a commonly used 
measure of instructional outcome. The other, less often examined, 
is the learner's feelings about the instructional situation. Research 
Workers do not always ask students questions that probe their 
liking for their teacher or the subject matter. The research worker 
Often overlooks inquiring about the students’ enjoyment of their 
classmates, the degree of threat felt in the class, and whether they 
Would take more courses in that area. Even when such issues are 
addressed in research studies, the affective set of dependent measures 
15 kept separate from the achievement measures. ; 

A problem in classroom research is to find ways to use multi- 

i inds of achievement and affective 
responses are used as indicators of the quality of classroom life 
for a child. The problem is similar to the difficulties in teaching 
reading. You can instruct so that high comprehension at slow reading 
rates is achieved, or you can produce low comprehension at high 
rates of reading. But it is obvious that there must be some optimum 
multivariate outcome that simultaneously considers both reading 
Comprehension and speed. The same kind of multivariate outcome 
measures, which simultaneously consider both achievement and 
affect, are needed for research on teaching. Not to consider simul- 
taneously both what is learned and what is felt about that learning 
is to fractionate school learning into pieces that do not resemble 


the student's view of reality. 
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Problems Associated with Defining the Unit 

of Analysis for the Independent Variable 

In recent studies of instruction, it appears as if the investigators 
have had a problem indentifying the unit of analysis for charac- 
terizing the independent variable. Is the teacher's question the unit 
of interest? Is the question, along with the wait time, the unit? Or 
is the teacher's question, the wait time, and the student's answer 
the unit that best characterizes the independent variable? And if 
the latter is most appropriate, does that transaction become part 
of an episode or strategy of even more complex dimensions and 
longer duration? Teachers follow strategies of questioning and 
of discussion. In an inductive lesson, the meaningful unit of analysis 
may be a one hour or one week episode that is concerned with the 
conservation of matter. The individual questions, reinforcers, probes, 
and student responses may be trivial aspects of the overall episode. 
New conceptions for the units underlying independent variables 
used in studies of classrooms are needed. 

But the problem is not just with picking the appropriate teacher 
behaviors as independent variables for the study of teaching and 
learning. There are related and more general problems of defining 
the unit of analysis for the observation of classrooms. Is the indi- 
vidual student the focus of investigation? Would the observation 
of the student-teacher dyad be a more appropriate focus? Or would 
the small group, large group, grade level, or school be the proper 
focus of study for some questions, some times? There are similar 
problems to face about the dependent variable. One could study 
certain classroom processes in relation to one column addition, 
two column addition, addition and subtraction, or mathematics 
as outcome measures. The issue is to pick the proper characteristics 
of the phenomenon of interest in relation to the questions being 
asked. And this is hard to do. These issues of data collection are 
very similar to the issues raised by Leigh Burstein in Chapter 3 
about choosing the appropriate level of aggregation in data analysis. 


Problems Associated with 

Sequencing Instruction 

A special case of the unit of analysis problem is the problem 
of sequence in instruction. Teachers give a good deal of thought 
to how they will sequence instruction. They purposely make 
decisions about the allocation of time and other resources to imple- 
ment beliefs about sequence and its relation to achievement. One 
aspect of sequence is the cycle of activities chosen to promote 
learning. For example, a short chalk-talk about decoding blends 
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may be given, using modeling and reinforced practice. This activity 
could be followed by allocating student time to decoding blends 
in a workbook; then a small group activity on decoding may take 
place. This might be followed by more decoding of blends in a 
Workbook. This sequence of large group instruction, seatwork, 
small group work, and seatwork, all concerned with decoding blends, 
represents a sophisticated, though untested, instructional strategy 
of the teacher. The strategy provides for an acquisition phase, 
a testing phase, and a retention phase of instruction. 

In addition to the problems generated by the need to study 
the sequencing of instructional activities over relatively short periods 
of time, there is also the problem of studying the sequencing of 
instruction in a curriculum area over relatively long periods. Table 
4-2 presents data about the ways different teachers allocated time 
to the teaching of addition and subtraction to five second grade 
Students over seventeen weeks of instruction. Besides differences 
in the total time allocated, very different patterns of sequencing 
are noted in these data. Student 0506 is instructed continuously 
in the content area. Student 1006, although receiving about 50 
Percent as much time in this content area as student 0506, also 
receives instruction throughout most of the time period. On the 
other hand, students 0702 and 1501 receive instruction for about 
ten weeks and then receive almost no instruction for the remaining 
time. Finally, student 0406 has had instructional activities dis- 
tributed primarily over two blocks of time, with a distinct period 
of no instruction occurring between instructional periods. From 
William James’s laboratory to today’s computer terminals, the 
issue of massed versus distributed practice has been vigorously 
studied. In these records of time allocation, reflecting instructional 
decisions of teachers, we see evidence that the debate is as yet 
Unresolved. Because teachers sequence activities and allocate time 
Ih various ways, researchers must learn to follow these patterns 
to learn how classroom instruction really takes place. Observation 
Instruments that do not pick up sequence in types of activities 
and in the scheduling of instruction miss important effects attrib- 
utable to the teacher’s instructional decision making. 


Problems Associated with Recording the А 
Difficulty Level of the Instructional Materials 
It is becoming evident that the level of difficulty of the material 


for individual students in the classroom must be examined. One 
reason is that when materials are well matched to the ability level 
Of the student you have a logical indicator of teacher accuracy 


ysis 
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in diagnosis and prescription, and diagnostic and prescriptive activi- 
ties are crucial aspects of individualized programs of instruction. 
Another reason for this concern with difficulty level is the inter- 
pretation of the empirical data from the Beginning Teacher Evalu- 
ation Study (Fisher et al, 1978). In that study, using regressions, 
the percentage of a student's work in activities or materials whose 
difficulty was judged to be “hard” for a particular student was 
a consistent negative predictor of achievement. The percentage 
of material that was judged “easy” for a particular student was 
a consistent positive predictor of achievement. It is probable that 
Some unknown ratio of easy to medium level difficulty is a strong 
predictor of achievement in elementary school classrooms. 

These findings indicate that those who would observe teaching 
and learning in classrooms will have to consider the difficulty level 
of the materials that the student is working with. It will take 
considerable effort to define “easy,” “medium,” and “ага” opera- 
tionally. In addition to extensive observer training, such categori- 
Zations will require extensive amounts of observer intuition and 


judgment. 


Problems Associated with Determining Rules 


for Access to Materials А 
How certain rules are formulated and enforced in classes affects 


Student access to classroom resources. There are classrooms where, 
in the first few days of instruction, well over one hundred rules 
for student behavior are clearly expressed by the teacher (Tikunoff 
and Ward, 1978). Such rules include many teacher statements about 
Pushing in line and chewing gum in class. But more subtle and 
therefore more difficult to spot, yet much more important for 
research in productivity and finance, are the rules for trading in 
Workbooks, for obtaining free time, for access to mathematics 
and reading games, and so forth. These rules prevent or enhance 
the chances that certain students will engage in particular activities. 
The major rules for work in a classroom are developed and codified 
In the first few days of school, a time when observers are usually 
asked to stay out of schools. Thus, understanding the rules for 
behavior in a classroom becomes, in part, a problem in the timing 
Of research. Unless researchers know something about the period- 
icity of certain events (e.g., rule setting in the first few days, costume 
design before Halloween, the study of plants in the spring, prepar- 
ation for testing at the end of school, etc.), they may make mistakes 
in the timing of their observations. If the timing is wrong, learning 
about the origins of a classroom's rules may be impossible. Yet 
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without knowledge of the rules and some understanding of their 
functional significance for teachers, the study of student access 
to resources will be incomplete. 


Problems Associated with Generalizability 

Perhaps the biggest problem for those who would study in- 
struction is the problem of the stability of the behaviors of the 
subjects and events under study. There are three important aspects 
to this problem that need to be considered—the stability of teacher 
behavior, the stability of student behavior, and the stability of 
measures of teacher effectiveness. 


The Stability of Teacher Behavior. Before entering a classroom 
to code teacher behavior in any sensible way, an observer has to be 
sure of two things: first, that the frequency of the event one is 
trying to observe is high enough so that at least one instance will 
occur during the observation period; second, that the behavior 
to be coded represents the teacher's usual and customary Way 
of behaving. Only if these conditions are met can a teacher's be- 
havior be sensibly characterized by the frequency count or rating 
scale description obtained in observations of classroom activities. 

Many studies relating teacher behavior to student outcome have 
examined teacher behavior that did not occur frequently. For ex- 
ample, among thirty-two primary grade science teachers, the use 
of questions calling for identifying relationships, hypothesizing, 
and testing hypotheses is extremely rare on any given occasion of 
observation (Moon, 1971). Another case of low frequency events 
in an important area of teaching has to do with the management 
skills of teachers. In some communities, classroom management 
is not difficult. Students are motivated, and parents exert pressure 
for conformity to school rules, so that traumatic disturbances are 
very infrequent. In other communities, serious problems exist all 
day long. Therefore, to observe instances of teacher behavior in 
the area of classroom management, designers of research must 
remember to take ecological factors into account. Furthermore; 
it has been learned that even in settings where management prob- 
lems usually occur with high frequency, certain teachers are 50 
quick to establish a nondisruptive social system that by the time 
the observer enters the class, particular kinds of events have been 
precluded from occurring. This is another example of the problems 
associated with the timing of research, discussed above. 

How then can one study teacher behavior when important vari- 
ables in the study rarely occur? One answer, of course, is denser 
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observation. Five one hour observations of teacher behavior, which 
is unusually high for most studies of teaching, may simply not 
provide all the information an investigator wants. In addition, part 
of the answer is knowing when and where to observe. For example, 
the first two weeks of schooling would be important for a study 
of management skills in inner-city schools, while trying for denser 
Observation later in the year, in other types of schools, might be 
wasted effort. 

The problem of estimating behavioral stability is partly related 
to the problem of the frequency of occurrence of behavior. When 
the frequency of a behavior is low, the correlations between the 
frequency of occurrence for certain events, over occasions (that is, 
a coefficient of stability for the behavior), will be low. But part 
Of the problem is quite distinct from the frequency issue. Think 
for a moment about the characteristics you prize in a teacher. 
Usually, people think of “good” teachers as flexible. Such teachers 
are expected to change methods, techniques, and styles to suit 
Particular students, curriculum areas, time of day or year, and 
So forth. That is, our standard of excellence in teaching implies 
a teacher whose behavior is inherently unstable. Needless to say, 
that is a problem for an observer who is trying to measure a teacher's 
Customary and usual ways of teaching. 

For our study of teaching, we have reviewed teacher stability, 
Over occasions, for a great many variables (Shavelson and Dempsey, 
1975). The results are fascinating. On the laughable side are the 
Coefficients of stability from Campbell’s (1972) analysis of science 
teaching at the junior high school level over two occasions. The 
Flanders Interaction Analysis System was used, and the stability 
Coefficient—that is, the correlation between a teacher’s standing 
©n a measure across two occasions—was, for a measure of indirect- 
Ness in teaching (the i/d ratio), —0.90. On five occasions, Moon 
(1971) studied thirty-two primary grade science teachers trained 
in the Science Curriculum Improvement Study. The stability co- 
efficient for the Flanders indirectness measure went all the way 
Чр to 0.18; for the frequency of fact or recall questions, the sta- 
bility coefficient was —0.12; and for amount of teacher talk, only 

12. In Borg's (1972) study, the behavioral stability of teachers 
Was measured after training in questioning techniques had taken 
Place. The stability of the ratio of higher-order-to-fact questions 
Was 0.07. The rather large number of low and even negative sta- 
bility coefficients that exist in the literature confirms our belief 

at the independent variables we often work with in studies of 
teacher effectiveness are not fair indicators of typical behavior. 
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We are so eager to capture variables for data analysis with our rating 
scales and frequency counts that we seem to have forgotten to 
check whether our methodology is appropriate to the phenomena 
we are studying. 

Of course, there are many exceptions to the trend for teacher 
behavior to be unstable. We have found ratings of variables over 
ten occasions that yield high stability coefficients. These include 
coefficients of 0.92 for teacher warmth, 0.79 for teacher enthusiasm, 
and 0.83 for teacher sensitivity (Wallen, 1969). We have found 
frequency counts demonstrating that a global variable composed 
of all types of reinforcement is reasonably stable over occasions, 
yielding a stability coefficient of 0.64 (Trinchero, 1974). In the 
latter study, however, we find considerable evidence pointing to 
the lack of generalizability of stability coefficients across different 
teacher populations, curriculum areas, and student populations. 
For example, the stability coefficient over two occasions for the 
frequency of positive verbal behavior was 0.04 for English teachers 
and 0.57 for social studies teachers. 

By examining the stability of teachers’ behavior, which is used 


as the independent variable in studies of teacher effectiveness, 
we conclude that 


1.Some teacher behavior that we think important to study occurs 
infrequently. To study it requires extensive observation in par- 
ticular settings at appropriate times. 

2. Some teacher behavior that we think important to study is un- 
stable over occasions. No practical amount of observation will 
result in a reliable estimate of a teacher's use of such behavior. 
Perhaps we need to develop measures of variance instead of 
measures of central tendency to describe it. 

3.Some teacher behavior is stable over occasions. In general, 
but not always, ratings or high inference variables, rather than 
frequency counts or low inference variables, are the more 
stable. 

4. Stability coefficients for much teacher behavior will not demon- 
strate ecological or population validity. Teacher behavior is 
moderated, as it should be, by the kinds of students and the 
variety of settings that teachers work in. 


Until we know more about what teacher behavior fluctuates, and 
how and why it fluctuates over time, settings, curricula, and popu- 
lations, studies relating teaching behavior to student outcomes 
must remain primitive. 
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The Stability of Student Behavior. Another problem of generaliz- 
ability in research on teaching and learning has to do with the 
stability of the student from day to day. My colleagues and I have 
recently spent a good deal of time studying the engagement of stu- 
dents in classrooms. Our evidence tells us that day-to-day stability 
of engagement rates for individual students is nonexistent. Since 
engaged time in academic pursuits is one reliable predictor of school 
achievement (Fisher et al, 1978), information about the stability 
of this behavior becomes important. But we do not know what 
makes for stable or unstable engagement rates across subject matter 
areas, grade levels, or even ten minute segments of instruction. 
Engagement for particular students in particular content areas 
of the curriculum varies from 0 percent to 100 percent for con- 
Secutive ten minute blocks of time. Traditional approaches to in- 
quiry in this area examine mean engagement for a student or a 
class. This could lead to an estimate that a student or a class is 
engaged about 70 percent of the time over different tasks and 
different days. But is is the variance of such behavior, within and 
across tasks and days, that is most interesting. The reactivity of 
the variable of engaged time to perceived and actual environmental 
changes can be understood only by inquiring of the subject in- 
Volved, by observing, and by employing small-scale interventions 
in the environment. To study variation in engaged time, a clinical 
approach with a student or a class may yield information of great 
value to the teacher and the students. 


The Generalizability of Measures of Teacher Effectiveness. To 
characterize teachers as more or less effective, we need to know 
whether they maintain their rank ordering on measures of effective- 
ness over time and over subject matter areas. There are about eight 
Studies of teacher effectiveness over lengthy periods of time (see 
Shavelson and Dempsey, 1975). The mean of these correlations 
between teacher effectiveness measured two or more times is about 
0.30. This figure is based on data from predominantly primary age 
children tested with standardized reading and mathematics achieve- 
Ment tests. Brophy’s (1973) study presents some interesting data. 
Residual gain scores over three years : 
mentary teachers. Of these teachers, 28 percent were consistent 
in their effects on students three years in a row. Approximately 
14 percent were consistently effective in producing higher than 
Predicted reading and math achievement, and 14 percent were 
Consistent in being associated with classes that had scores lower 
than predicted. On the other hand, 13 percent of the teachers 


were examined for 165 ele- 
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showed linear increases in residual gains over the three years. That 
is, they appeared to be getting more effective in their teaching. 
Similarly, 11 percent showed a linear decrease over that time period. 
They seemed to be getting less effective over time. The remaining 
49 percent of the teachers in this sample were inconsistent in the 
patterning of their residual scores over time. 

In a review of short-term studies of teacher effectiveness, ranging 
across grade levels and all kinds of curriculum areas, moderately 
stable estimates of teacher effectiveness are obtained when the same 
content is taught to similar students (e.g., teaching and reteaching an 
ecology lesson to two samples of urban students). But when different 
content is taught to two or more groups of similar students the 
effectiveness measures are not found to be stable. Similarly, when 
different content is taught to the same students, estimates of ef- 
fectiveness from occasion to occasion are unstable. Another study of 
this problem involved about 200 elementary school teachers, each of 
whom taught a two week, specially designed teaching unit in reading 
and mathematics. Residual gain scores for each class in each subject 
matter were calculated. These measures of effectiveness, using 
different content and the same students, were correlated. From these 
data we find that measures of effectiveness in the two curriculum 
areas correlate about 0.30 (Berliner et al., 1976). 

It appears that teachers do not, by and large, remain in a stable 
ordering on measures of teacher effectiveness. If, as we have dis- 
cussed, the independent variables we typically look at are often 
unstable and measures of teacher effectiveness also show instability, 
the possibility of correlating teacher behavior with student achieve- 
ment to determine effective teaching behavior is very limited. 


Summary 

Observing in classrooms is difficult. Some of the problems may 
be insurmountable if traditional research methods, including simple 
regression and the testing of models using regression approaches, 
are used. However, there are other forms of inquiry besides those 
that lead to analyses by regression equations. Many of these other 
forms of inquiry are clinical in nature and rely upon methods that 
are more descriptive and less quantitative. When we examine the 
phenomena to be studied and look at some real classroom problems; 
the clinical method appears to gain in appeal. 


THE PHENOMENON TO BE STUDIED 


The core of the phenomenon of interest is the day-to-day interaction 
between teachers, students, tasks, and materials. This process can 
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be observed and experienced in any local elementary school. How- 
ever, what we observe in a classroom is the surface structure of 
the phenomenon, whose deeper structure is rooted in a series of 
Social, political, and economic systems. Indeed, attempts to under- 
Stand portions of the schooling process in isolation from the rest 
of the phenomenon have been of very limited utility. Three charac- 
teristics of the phenomenon have important consequences for 
understanding teaching and learning in classrooms. 

First, the phenomenon is remarkably complex. The factors affect- 
ing school learning must surely number in the thousands. These 
include presentation variables, management and control structures, 
physical layout of the classroom, class membership, social milieu 
within the class, staffing and resource allocation within the school, 
and so on. The network of influences from these sources interact 
in a complex manner to produce, or at least moderate, the behavior 
Observable in the classroom. In addition, the inhabitants of each 
Classroom fill a wide variety of roles. Teachers operate as presenters, 
Organizers, providers, and punishers. They are also parents, tax- 
Payers, and employers. Students develop and maintain roles in rela- 
tion to teachers, principals, and peers. These and other factors attest 
to the complexity of the teachinglearning process in elementary 
Schools. 

Second, the phenomenon is dynamic. The factors that give rise 
to particular types of behavior during one hour, or one day, do 
Dot continue in a steady state for very long. Influences on individual 
Children and teachers, as well as influences on classes and schools, 
Change from minute to minute and from day to day. It is a common- 
Place to hear a teacher state that “today is just not a typical day." 
This statement is true, in that the process changes enough from one 
day to another that few days seem “typical.” | | 

Third, the phenomenon is extensive in time. A student is typi- 
Cally grouped with a class of thirty students and one teacher 
for ten months at a time. Social as well as cognitive skills are 
developed slowly over relatively long periods. Learning to read 
and write is achieved over several years. For teachers too, the 
Phenomenon is extended in time. Teachers spend about 900 hours 
Per school year with students. Substantial changes in teacher 
c» Student behavior can hardly be seen in a week, let along a 

ay. 
The complex, dynamic, and extensive characteristics of the 
Phenomenon have not often been given enough attention. Conven- 
tional research often ignores the implications of these characteristics 
for the study of classroom teaching and learning. 
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MATCHING METHOD OF INQUIRY WITH 
THE PHENOMENON OF INTEREST 


A kind of inquiry into the realities of classrooms that is also appro- 
priate for the phenomena under study depends upon methods 
ordinarily called clinical. Such methods have been undervalued 
and consequently underutilized in research on teaching and learning 
in classrooms. Clinical method qua method has been neglected 
as a means of inquiry for a number of reasons. First, clinical work 
usually takes a longer time than the experimental methods used 
in educational inquiry. Second, clinical method implies helping 
(e.g., clinical psychology) and the related belief that knowledge 
generated while serving to help others is of less worth than know- 
ledge arrived at by other means. Finally, our prestigious journals 
are loath to publish clinical studies because of their applied nature 
and suspect methods. For these and other reasons, such studies 
do not bring professional rewards, and this reduces the incidence 
of clinical inquiry by researchers. 

To promote critical thinking about clinical methods in instruc- 
tional research, particularly as applied to the study of teaching 
and learning in classrooms, I will try to distinguish clinical method 
from other methods, note some unfortunate and inaccurate conno- 
tations associated with the method, describe a number of classroom 
phenomena uniquely suited to study by clinical approaches, and 
discuss methodological techniques compatible with clinical forms 
of inquiry. 


CLINICAL INQUIRY: AN ATTITUDE 
AND A METHOD 


What often distinguishes a clinical internist, a clinical neurologist, 
or a clinical psychologist is the attitude taken in a scientific inquiry- 
If the approach is that of understanding the individual person OY 
individual classroom for the sake of helping the person or the partici- 
pants in that classroom to function better, the clinical attitude 
and consequently the clinical method have been adopted (Watson, 
1963). Clinical method in psychology is generally though of as 
the application of psychological principles and techniques to the 
problem of the individual. Thus, clinical method encompasses 4 
broad spectrum of ideas and practices. Clinical method in the study 
of classroom teaching and learning, and in the service of the clinical 
attitude, is designed to promote the welfare of the individual in 
classrooms. The clinical instructional researcher, like most clinicians, 
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is a scientist, although not all scientists are clinicians. The basic 
data are the same for the clinical and nonclinical educational re- 
searcher—the behavior and experiences of participants in the edu- 
cational process. The clinician, however, unlike his nonclinical 
colleague, is sometimes compelled to do something immediately 
useful with his observations and hypotheses. That urgency to act 
may demand reliance upon data of unknown reliability and validity, 
depend upon subjective interpretations of phenomena, and require 
the use of intuition by the investigator. 

But many educational researchers seem to have forgotten about 
the clinical attitude and have even more often neglected clinical 
method. Some reasons for the retreat from the clinical attitude 
and method have to do, in part, with the connotations associated 
with the term clinical. 


CONNOTATIONS OF THE TERM CLINICAL 


Clinical is a word often associated with such terms as idiographic, 
applied, and qualitative which have a low respectability in the 
Scientific world. It is a word often placed in opposition to such 
terms as statistical or experimental, which have very high respect- 
ability in the scientific world. Because of the company it keeps 
m the opposition forced upon it, the word clinical has taken on 
Ower class status. But let us examine these terms a little more 
closely. 


Nomothetic and Idiographic Qualities 
РА Because of the concem with individu 
а 4 can never yield the nomothetic 
vis the scientific endeavor so values. 

wing the findings of traditional educa 
Questioned the generalizability of the findings held in such high 
repute. Social science research in general, and educational research 
in particular, he says, are so complex that generalizable findings may 
elude us forever. Too many variables interact simultaneously for us 
to be able to study their joint effects, and too many effects change 
Over time. This makes it very difficult to rely on the replication of 
Social science findings in a particular setting at a particular time. 

Can clinical information be so much worse? Was the world en- 


liched or impoverished by Freud’s : 


insights? Are Erikson's intuitions 
AEn developmental crises of no scientific merit? Are the behavioral 
ata from single subject studies О 


з f behavior modification of such 
little worth? 


al cases, it is felt that clinical 
generalizations and findings 

Yet Cronbach (1975), re- 
tional research, has seriously 
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These three powerful concepts and methods in psychology were 
the result of clinical, not traditional, research approaches. From 
these examples we learn two things. First, as in the case of Freud 
and Erikson, clinical inquiry can serve as the basis for theory. In 
helping an individual person or classroom function better, a clinical 
psychologist can develop—through observation, interview, record 
keeping, and anlysis—powerful conceptualizations of human be- 
havior. Some clinical work is concerned with generalizability (e.8., 
double-blind studies in medical research). However, much of it 
is idiographic. But regularities in idiographic phenomena can become 
the basis for nomothetic generalizations. Second, as in the behavior 
modification studies, we can see how clinical endeavors apply well- 
respected scientific findings in respectable scientific ways, while 
helping individuals. The Journal of Applied Behavioral Analysis 
is devoted to such applied, clinical science and does not differ 
in function from clinical journals of medicine. 

I conclude that the difference between nomothetic and idio- 
graphic approaches, in terms of their yield of generalizable findings, 
is not as clear-cut as was once thought. Thus clinical methods, 


which are necessarily idiographic, need not be downgraded for that 
reason. 


Basic and Applied Research 

The word psychology, sociology, anthropology, or economics 
prefaced by the term educational connotes an applied discipline. 
Such applied disciplines are concerned, however minimally, with 
the improvement of education. Implicit in almost all educational 
research is the clinical attitude and, thus, an applied focus. Should 
it be necessary, I suppose, we can talk about more and less directly 
useful research. But to talk of basic and applied research in education 
is, perhaps, to misunderstand the goal of an applied discipline. 
Cronbach and Suppes (1969) tried to avoid this problem by talking 
about decision-oriented and conclusion-oriented research in edu- 
cation. They pointed out that equating basic research with good 
research and applied research with bad or sloppy research is useless. 
Research studies differ simply as a function of the different ques 
tions they are designed to answer. It should be repeated, over an 
over again, that good research is a quality that is independent of 
whether or not a study is directly or indirectly useful in education. 

Recently, in the presidential address to the American Educational 
Research Association, Kerlinger (1977) made the annual plea for 
more basic research and managed to equate basic research with good 
research. But others have a different perspective. Jackson and Kiesler 
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(1977), discussing the same issues, take a much more sophisticated 
view of the benefits and effects of more or less directly usable 
research. They conclude, as does Kerlinger, that a proper mix of the 
two forms of inquiry is needed. But, unlike Kerlinger, they assert 
that 


ideas that are good . . . are those buttressed by rational and empirical 
arguments, which are the kind of arguments offered by scientific re- 
search and disciplined scholarship. Some knowledge, on the face of it, 
is closely related to the substantive concerns of educators, some more 
distantly so. Within broad limits, it is the former to which we would 
give preference in seeking support for new endeavors. (Jackson and 
Kiesler, 1977: 


And Slavin (1978), taking vociferous exception to Kerlinger’s call 
for more basic research by the National Institute of Education, 
concludes that “what is needed in education is more, not less, 
research directed at the improvement of instruction and of the 
schooling experience for children” (p. 17). That kind of statement 
represents the clinical attitude. 

Ultimately, as a profession, wha 
whatever degree of applicability. I personally want to see more g 
research that has a high degree of immediate usefulness. 


t we want is good research, of 
‘ood 


Quantitative and Qualitative Data 

Another unfortunate distinction is found 
the term clinical with the term qualitative. I 
eschews quantification, when no such supposition need be made. 
Data are always, in some sense, both quantative and qualitative. 

Because of the focus on individual people or classrooms, events 

at occur just once may be given special weight by a clinician. 

his appears to be qualitative. Because the clinician is particularly 
aware of ecological contexts, descriptions of environmental press 
Variables become important. This appears to be qualitative. If com- 
pelled to help, the clinician may act before reliable and valid re- 
lations are established. Although the clinician may not make public 

е numerical values of variables, this is not the same as avoiding 
Quantification. In the course of the work, a clinician must do some 
bus rapid and complex quantification. Weights are assigned to 
events, probabilities are estimated, Bayesian hypotheses are tested, 
and sophisticated judgments of utilities are computed. This is not 
always articulated, but it is always present. Scientists must engage 
11 some degree of quantification, but they need not always define 


in the association of 
t is as if a clinician 
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their variables in that manner. Scientists should also respond to 
the qualitative aspects of their variables. It is these qualities that 
elicit insights and help clarify thinking about complex events, such 
as n way interactions in data. This kind of analysis of data is neces- 
sarily more qualitative. In sum, empirical data, which are the funda- 
mental data of clinician and nonclinician alike, are always more 
or less qualitatively analysed at some point in the study. Thus, 
the distinction between quantitative and qualitative is one of em- 
phasis in research styles and not a clear dichotomy. The recent 
distinction between data analysis and statistical analysis made by 
Tukey (1977) is relevant here. 


Clinical versus Statistical Approaches 

The work of Meehl (1954) and others on statistical versus clinical 
prediction in clinical psychology set up an unfair opposition. The 
solely clinical prediction, if formal tests exist, is as silly as making 
an estimate of a person's temperature when thermometers exist. 
If thermometers exist they should be used. If reliable and valid 
scales and instruments are indicators of conditions, events, and 
states of people, they too should be used. The combination of 
what has been called clinical intuition and statistical information 
is what is valued and successful. Either alone has faults. 


Clinical versus Experimental 

Another opposition is that between clinical and experimental. 
An experiment, like a test, is nothing more than a controlled ob- 
servation. Clinical method, while often relying on observations 
of natural behavior, has always supplemented such observations 
with tests and has always proposed experiments for the subject of 
the clinical inquiry. Piaget's methods are a case in point. Although 
not possessing the clinical attitude (he is not interested in helping 
anyone improve any thing), his techniques are clearly clinical. 
He observes, he verbally probes and tests, and he does experiments, 
often with his own children. Is Piaget a great clinician, a great ex- 
perimentalist, or simply a great scientist? I think the latter, and 
thus the clinical versus experimental dichotomy is also found 
wanting. 


Preliminary State of Research or Not 

In a science where experiments are possible and where multivariate 
statistical packages and giant computers are at our beck and call, 
it is often believed that clinical methods serve as a first stage of 
inquiry: information obtained from clinical inquiry should lead, 
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ultimately, to hypothesis testing in randomized true experiments. 
This need not be so. Ethologists, anthropologists, sociologists, 
economists, and many psychologists are quite happy to make clinical 
inquiries and to use evidence obtained in such inquiries as the basis 
for decisions, conclusions, models, and theories. And this is justi- 
fiable. Clinical evidence need not always be thought of as the first 
Step in inquiry; it is one of numerous ways of obtaining information 
that may or may not be subject to further inquiry using other 
methods. More important than providing variables for regression 
equations is the verification of clinical insights by other researchers 
using whatever methodology is appropriate. 


Summary 

The connotations associated with the term clinical do not, on 
close examination, appear nearly as negative as might be feared. 
Clinical work is idiographic, but can serve nomothetic causes; it 
is more applied than basic, but that is often why one studies edu- 
cational phenomena; it is concerned with both qualitative and 
quantitative data, the distinction being less apparent once one 
examines any empirical data; clinical work is both intuitive and 
Statistical, melding the best of the sensitive human information 
Processor with other kinds of sensitive instrumentation; the clini- 
cian may be an experimentalist, but is not usually preoccupied with 
large samples. Finally, clinicians need not view themselves as pro- 
Viding preliminary data for later experimental work. They can choose 
to see themselves as providing data and ideas that can stand or fall 
like any other ideas. All clinical insights are subject to verification 
In the same way that any other findings or ideas are subject to 
Verification—through independent empirical confirmatory evidence 
ànd the consensus of opinions from others in the field, — — — 

Та my opinion, clinical method need not be held in disdain or 
disrepute. Clinical methods can be employed to understand partic- 
ular problems in education, some of which may not be understood 
Using more traditional forms of inquiry. Some educational areas 
OF interest. that lend tiremselvés to clinical inquiry are-describea in 


e next section. 


REALITIES OF CLASSROOMS AND THE 
ATURE OF INQUIRY 
umber of problems related to the 


that I find puzzling and interest- 
these problems can be illuminated 


1 Would like to present briefly a n 
may of productivity and finance 
8. I also believe that the nature of 
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by clinical kinds of inquiry. And, if one chooses to adopt a clinical 
attitude, some of these problems may be remedied. 


Social-Psychological Forces in Decisionmaking 

at the Classroom Level 

In a recent study of cross-age tutoring in the public schools, 
fifth and sixth grade students designated as slow learners were 
assigned to help second and third grade students, also designated 
as slow learners. The older tutors were taught how to diagnose and 
prescribe and how to prepare lesson plans for their students. The 
project was administered by the teachers involved and monitored 
by the evaluation team of the school district. After a few weeks 
there were some behavioral indications that the program was a 
success. The older tutors were spending a good deal of time pre- 
paring to work with their younger tutees. They were also seeking 
guidance from the teachers about rules of decoding, problems of 
word meaning, and other areas of reading that they had had dif- 
ficulty with previously. No gains or losses in achievement were 
noted in the case of the younger students, but it was clear to every- 
one involved that the program was having beneficial effects for 
older tutors. The program was cancelled. The teachers did not 
ike it. 

Why is it that a seemingly successful program was dropped from 
the schools? What did teachers notice that made them want to 
stop the program? Were they embarrassed by the potential positive 
effects of the study? Were they frightened of losing their jobs be- 
cause the students might no longer be designated as “slow”? What 
good is it to teach someone about diagnostic and prescriptive pro- 
cedures, the use of contingent reinforcement, and the advantages 
of practice, if the program it is a part of can be stopped because 
of unknown social-psychological forces at work? Learning more 
about these forces may help others who choose to innovate under- 
stand how acceptance of an innovation occurs, The way one would 


learn this, I think, is by taking a clinical attitude and adopting 
clinical methods. 


Changing Classroom Activities 

In a recent study of four second grade elementary school class- 
rooms, four instructional consultants tried to change classroom 
processes in order to increase student engaged time in academic 
pursuits (Berliner et al, 1978). From one diary of the process 
we learned that the consultant raised the possibility of using goal 
setting to increase student engagement during independent seatwork. 
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Some reward, he suggested, such as a free reading period, could 
be used as an incentive for students to complete their assigned 
work within a given period of time. (This procedure was called goal 
Setting to ease the shock for teachers, since past experience had 
revealed that the term "contingency management" set off visceral 
reactions of great magnitude.) The idea of goal setting was explained 
to the teacher and put into contexts that she understood, because 
the suggestion came about after many hours of classroom obser- 
vation. The teacher responded negatively, saying that she did not 
like the idea of goal setting. The strategy could never be imple- 
mented because it always met with resistance. What good is it, 
we wondered, to have knowledge of behavior modificaiton proce- 
dures and confidence in the power of such approaches, when imple- 
mentation of that knowledge is sometimes so difficult to achieve? 
In another class, the instructional consultant tried to restructure 
the classroom management procedures. Students were given manage- 
ment cards that told them where they should be at different times 
during a one hundred minute block of time during each morning. 
Large clocks were purchased so that students could monitor their 
time in different activities. The new system was designed to allow 
some time for the teacher to engage in mathematics instruction 
with carefully selected small groups of students. This pleased him 
because there had been very little time for the actual teaching 
Of mathematics concepts under the previous classroom manage- 
Ment system. The new procedure seemed to be working well for 
the teacher and students, but not for the paraprofessional class- 
room aide, who now had many more responsibilities. Formerly, 
the aide sat at a desk and checked mathematics computations. 
Now the aide had to perform four different functions during this 
9ne hundred minute time period. Although the teacher was re- 
Juvenated with his newfound time to teach, he eventually switched 
back to the older management structure because of the aide s = 
lappiness. We wondered about the nature of the вуда o 
lonship between teacher and aide. Who was really in charge? Why 
did the aide control the type of management system present in 
the class? Was the teacher, despite verbal expressions, of delight, 
cule frightened of teaching? It is possible that it is easier to a 
Workbook pages than to try to teach mathematics concen s 
€ teacher is responsible for instruction, а cial in 
Comprehend must be much more of a personal blow than it is 
Instruction takes place by means of a workbook page. —— 
In another classroom, the consultant taught the teache a 
to structure activities. This meant having the teacher give expli 
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directions and instructions so that all students knew what was 
expected of them at all times. The number of children not engaged 
in academic pursuits was high in this class, in part because the 
teacher assumed that the children knew what workbook pages to 
be on, which contracts to be in, which listening stations to be at, 
and so forth. In fact, many of the children did not know what 
to do or where to be. The teacher was instructed in structuring 
activities, and frequency counts of structuring moves were made. 
The teacher’s behavior changed in the direction desired, and she 
acknowledged the obvious increase in engaged time for students 
in that classroom. Many more students appeared to be doing what 
she had hoped they would do. The consultant and the teacher 
both agreed that a very functional modification of her behavior had 
occurred. Four weeks later it had vanished. What is the nature of 
the classroom such that obviously effective teacher behavior is 
dropped from the teacher’s repertoire? 

There were present in each class forces that resisted change- 
These forces are far more powerful than those who would be agents 
of change can imagine. But we are dealing with events no different, 
I think, from the defense mechanisms that occur during clinical 
studies of individuals. We can use clinical methods to study these 
events, to figure out what exactly is happening and why they occur- 
We might also want to adopt a clinical attitude in order to help 
teachers make changes in their behavior. 


Intentionality in Classroom Studies 

A totally different area for research in studying teaching and 
learning has to do with what educational philosophers (Fenster- 
macher, 19779) call intentionality. When studying the teacher in the 
classroom, the research paradigm usually in effect has been smugly 
behaviorist. From the behavior of the teacher and the behavior 
of the students, cause and effect relationships are hypothesized. 
Sometimes these are subject to experimental tests, sometimes they 
are presented in a correlational form, but always it is the observable 
behavior that is examined. But this need not be so. The powerful 
perspective about admissible evidence in scientific inquiry allows 
the everyday ordinary language explanations of persons to be taken 
seriously as explanations of conduct. “The things that people say 
about themselves and other people should be taken seriously 8$ 
reports of data relevant to phenomena that really exist and which 
are relative to the explanation of behavior" (Harré and Secord, 
1972:7). Understanding the intentions of teachers as they work 
calls for clinical inquiry of a type not ordinarily undertaken. Recent 
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research that comes close to this approach includes that of Shavelson 
(1976) and of Shulman and Elstein (1975), who studied the decision- 
making of teachers. Teacher decision making in these studies is 
described as clinical information processing, and the focus is on 
the uniquely human characteristics of thinking and feelings, as 
well as behaving" (Shulman and Lanier, 1977:49). Such research 
cannot occur without consideration of the purposes of a teacher 
in the classroom. These purposes, I think, can be discovered only 
through clinical interview, clinical probing, and careful descriptions 
of the behaviors of teachers in classrooms. 


On Origins and Pawns 

А Many of the studies of teaching and learning have, at least impli- 
citly, adopted paradigms in which the teacher is conceived of as an 
origin, or controlling agent, and the student is thought of as a pawn, 
Or a subject of control. Observation reveals that this is not the 
state of events in all classroom settings throughout the day. Nor 
is it an accurate description across classes. Most interesting of all 
15 that it is probably not an accurate description of the same teacher 
Over successive years. For example, the correlational data on teacher 
effectiveness from year to year reveal low estimates of stability 
when student achievement is used as the criterion. In part, this is 
because of the individual makeup of each class. The result is that 


teacher behavior is not emitted in some consistent fashion year 
in and year out. Instead, behavior is often elicited from teachers 
by special student demands. The basic controlling agent for the 
interaction may not be the teacher. This interplay of origin and 
Pawn role in classrooms becomes very interesting to study, either 
Чау by day or year by year. And these roles can appropriately 


be studied using clinical methods. 


On Performance Theories versus 

Learning Theories 

Teachers do not hold learning 
сеп taught, they are smart enough to 

огу is bunk! Observable performance 
Sophically related to leaming outcomes and to 
in very important in determining pen 
Cide to engage in this or that activity base : 1 
of Socially Re reais behavior. They do not determine their 
activities on the basis of some theory about how to optimize student 
Sarning. They keep students engaged because that is good; they 
give quizzes because that is accepted; they assign silent reading 


theories. Despite what they have 
realize that most learning 
that is logically or philo- 
personal survival 
om life. Teachers 
on some notions 
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because that is considered good and results in personal rest and so 
forth. The reasons that teachers have to support certain kinds of 
performance theories need to be explicated. The relations between 
a teacher's personal performance theory and more formal and 
scientifically respectable learning theories might best be studied by 
experimental methods. But uncovering performance theories and 
the rationales that support them calls for a clinical inquiry of great 
sensitivity. 


On Understanding More About 

Puzzling Findings 

There are a number of puzzling findings in the literature that 
pertain to research on productivity and finance. Explication of 
these findings probably requires clinical work. For example, there 
are a number of research studies that show a negative correlation 
between the use of teacher-made materials and student achievement. 
Heavy use of teacher-made materials is often taken as a sign of 
motivation by the teacher to do extra work. Logically, this might 
be an indicator of teacher effectiveness. On the other hand, teacher- 
made materials may not supplement the curriculum materials that 
are in use: the special materials may be divergent. In that case, à 
negative correlation would result. Small-scale investigations of 
teachers’ intents and the correlation of the special materials with 
the standard curriculum are in order. Small-scale clinical studies 
will probably be as sensitive to this problem as any other form 
of inquiry. 

There is also a suggestion in the literature that classroom puzzles 
and games, even if academic in appearance, are negatively related 
to achievement. Many instructors are using puzzles and games with 
increasing frequency, thinking that this is a way to keep students 
motivated in academic areas. What is it about puzzles and games 
that might relate positively to academic achievement, and what 
characteristics might be deleterious? As with the problem of teacher- 
made materials, the relation of puzzles and games to achievement 
needs to be investigated. 

Another interesting finding has to do with the inability of many 
school districts to implement a program of instruction. The follow- 
through data are quite clear. Site variance within programs is enor- 
mous. Between program variance in instructional behavior and 
activities was not greater, in general, than the variance within pro- 
grams when different sites throughout the country were examined. 
If specially constructed and clearly defined curricula cannot be 
implemented in any coherent way, many of our national efforts 
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о чл. ош may be doomed to failure. What is it about 
ades E. а results іп such vastly different implementations 
МА е hich people at a site can enhance or hinder the 
Vo cid e effort? Who has the power to change a defined 
pub Et e such a way that implementation of a program is not 
ата ^ БА ese questions of influence patterns and change are as 
sim: * 2 a gne by clinical studies as by other means. Cer- 
bud 146: sd cg require sensitivity to political and social 


Summary 
ч кы ро of this section is that there exists a set of problems 
sen ool for which the clinical attitude is appropriate and for 
belie? "enam method may be a preferred mode of inquiry. It is my 
het. at a skilled observer, interviewing teachers, taking seriously 
den ey say, establishing relationships of trust with them, and 
M i ы hypotheses with them, can do more to illuminate certain 
bens ional problems than can dozens of true experimental studies. 
pico iii d does what I an saying deny the utility of traditional 
Pecan ental methods. I am merely saying that for а certain set of 
bones ns, clinical methods probably have as much chance of un- 
ng important relations as do experimental methods. 


METHODOLOGY IN CLINICAL STUDY 


of innovations in methodology 
d some older methods have re- 
e are mentioned in this 


I 

mae last few years, a number 

Bes been brought to the fore, an 
ed new respectability. A few of thes 

Section, 


to think about data. Tukey (1977) 
ctly appropriate to work 


e of exploratory data analyses. 
uires a sensitive clinical touch, 
al skills. The statistician 
ht into the variables and into 
As these techniques develop, 
will also take place. 


rd exploratory data analysis red 
massa, | as an advanced set of тело 
В то these data must have insig 
e ntext within which they occur. 
melding of clinician and statistician 
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Case Studies 

Stake (1978) has upheld the usefulness of the case study method. 
He writes, “When explanation, propositional knowledge, and law 
are the aims of an inquiry, the case study will often be at a dis- 
advantage. When the aims are understanding, extension of ex- 
perience, an increase in conviction in that which is known, the 
disadvantage disappears” (p. 5). In further discussion he says 
that 


intentionality and empathy are central to the comprehension of social 
problems, but so also is information that is holistic and episodic. The 
discourse of persons struggling to increase their understanding of social 
matters features and solicits these qualities. And these qualities match 
nicely the characteristics of the case study. (P. 7) 


The case study is one of the oldest techniques in social science. 
It fell into disrepute as the more statistically minded and the emu- 
lators of nonsocial sciences increased. The case study as a social 
science method is now making a comeback, precisely because it is 
uniquely suited to certain classroom phenomena. In the classroom 
we find particularly complex phenomena of an episodic nature, 
where the intentions of the people in situ must be understood. The 
case study method, in the hands of people who have received training 
(and not just those who choose to write), can make an important 
contribution to furthering knowledge about the realities of the 
classroom. A related way of studying classrooms combines case 
studies with connoisseurship and criticism—scholarly activities 
borrowed from the humanities (Eisner, 1976). 


Ethnography 

Although there seems to be difficulty in defining what is an 
ethnography, it is clear that a rich observational record, relatively 
free from bias, is what is meant by an ethnographic report. These 
reports are records of behavior in settings. They are discriptive, 
but can be turned into quantitative data—that is, the frequency 
of cooperative behavior in clay-modeling settings in early child- 
hood classrooms can be recorded. Ethnographers can pick up that 
sort of information if one of the variables they are concerned with is 
cooperative behavior. Just as the ethnographer working with a 
primitive tribe has a set of concepts to work with, having to do with 
religion, sexual rites, passing of property, and the like, the eth- 
nographer in the classroom has to learn which variables are important 
for developing a rich description of the class. Certainly such things a$ 
academic focus, cooperation, democratic attitudes, and conformity 
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are of concern to the classroom ethnographer. But in our primitive 
State of application of ethnographic procedures to classrooms, 
we do not yet have rich conceptual categories on which to focus. 
Nevertheless, we can expect ethnographic procedures to be highly 
useful in case studies of classrooms where clinical work is to be 
undertaken. The rich descriptive records provided by ethnographers 
are an excellent basis for discussion and potential change in class- 
room functioning (see Tikunoff, Berliner, and Rist, 1975; Erickson, 
1977; Doyle, 1978). 


Graphics and Time Series А 

In the past few years there has been an increase of N — 1 studies 
(see Kratochwill, 1978). Through the use of graphs as developed 
by the behavior modification clinicians and with the use of time- 
Series analyses, one can infer the cause and effect relations between 
environmental events and the behavior of single subjects. These 
techniques become invaluable tools for the clinical scientist inter- 


vening in classrooms. 


Summary 
As this section has demonstrated, there are new or newly accepted 
methodologies for clinical work. As a paradigm shift takes place, 
these methods and styles of research will become more prevalent. 


CONCLUSION 


This Chapter was designed to make a case for clinical "s of 
Classtoom instruction, specifically so that the Finance е ed 
ductivity Center at the University of Chicago could plan their cla 


: i h 
room- i e problems in conducting suc 
based research. First some p ng more clini cal ap- 


research were noted. Then the idea of trying тог 
Proaches to research was presented; the term “clinical was пе 
50 that some negative connotations associated with ы ipi) = 
* dispelled; and а description of problems mee па е an shown 
Y clinical kinds of inquiry was presented. Finally, 1 s for doing 
lat there are new or newly rediscovered methodolog e 
clinical work that have considerable scientific respectabi : у. oe 
he notion of a clinical approach to the eed 


much 5 " 
more t P 
i e to commend it than I realized when I first started 


oin 5 due Е i ; 

Sele ечен Tie, ранерь also шош dices m 
Nowledge in such fields of psychology; sociology, ап * the study 

I Ope others will think seriously about clinical ae 

of classroom teachin g, learning, finance, and productivity. 
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THE SUBSTRATUM FOR MODEL UNFOLDING 


In order to use pupils’ learning time and educational 
resources efficiently and consciously with respect to . 
learning outcomes, we have to analyze classroom pro- 
cesses in ways that will allow directed change based on insight. 
Research on schooling has to result in more rational and effective 
allocation of human and material resources—intentionality, inherent 
to education, and restrictedness of funds make this necessary. 
Research on schooling should follow a holistic rather than a 
particularistic view of the educational process—that is, the teaching- 
learning process should be seen from the perspective of an edu- 
Cationally engaged citizen or politician who wants to ensure that 
resource allocation occurs rationally. This calls for the analysis 
Of the relationship of relevant and changeable determinants of 
School learning to outcomes. Instead of evaluating curricula or 
Modules thereof, instead of investigating the effectiveness of teacher 
characteristics and teaching strategies, and instead of singling out 
isolated pupil characteristics such as aptitude or engagement, we 
ave to ask, How can a child’s school life b 


e organized so that 
he will learn. most efficiently what he should acquire? This implies 
8 segmentation of the teaching and leaming processes without 

lar teaching and learning aspects, 


estroying the Ganzheit of curricu 
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and this requires relating resources available and their use to pupils’ 
learning experiences. 

The isolation of the study of curriculum from that of instruction, 
of teaching, of school organization and administration, and of 
school finance has amputated the practical and conceptual relevance 
of much of educational research. Also, the research focus on narrow 
segments of the child’s classroom experiences has severely con- 
strained the use of results for change in praxi, as we lack a com- 
prehensive description of classroom learning into which these 
particularistic researches may be integrated. Research that contri- 
butes directly to practical education must begin holistically and 
move toward more differentiated, minute issues of the teaching- 
learning process. Narrowed, well-defined studies and experiments 
may help to clarify aspects of classroom learning but are not able 
to contribute directly to inferences for practical and cost-efficient 
change. 

Many of the particularistic research foci and models developed 
in splendid isolation might be ultimately integrated into a broad 
perception of the educational process. We have taken on the task 


of developing a model that spans levels of generality and, likewise, 
specificity so as to allow integration. 


MODELS OF SCHOOL LEARNING 


For the past three years, we have been engaged in the continuing 
development of a model of school learning (Wiley and Harnischfeger, 
1974; Harnischfeger and Wiley, 1975, 1977). This model draws 
heavily on Carroll’s Model of School Learning (1963) but is also 
influenced by Bloom’s “Time and Learning” (1973). The consensus 
of the three models is simply stated: Pupil’s experiences, adequately 
plumbed by the amount of time spent actively learning, and pupils’ 
characteristics, including their cognitive capabilities, are the sole 
proximal and distinctive determinants of achievement. Instruction 
influences active learning directly via the allocation and use of 
instructional time (opportunity) and indirectly via pupil motivation. 

Beyond these simple commonalities, the models differ both in 
emphasis and in assertion. Carroll focuses on the distinctive role 
that various cognitive abilities play in school learning, discriminating 
the task-specific from the general and carefully articulating their 
relations to quality of instruction. Bloom turns his unrivaled per- 
ception of the sequential character of many classroom learning 
experiences into a strong focus on how performance on one task 
preconditions success in another. Our model refines the nature of 
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class learning opportunities, articulating their strategic origins and 
their powerful influences on both the content and degree of edu- 
cational achievement. 

The models’ treatments of instruction and its relations to learning 
are diverse and differ greatly in detail and coverage. In contrast to 
Bloom and Camoll, we extensively sift teaching decisions and 
activities, elaborating their consequences via a highly differentiated 
segmentation of pupil pursuits. One important outcome of this 
differentiated emphasis is the clarity with which the reasons emerge 
for the great impacts of learning opportunity on achievement. 
This winnowing also permits separating those elements of instruction 
that lead to the content and quantity of pupil opportunity (planning 
and implementing) from those that lead to the degree of active 
learning (motivating and monitoring) and from those that influence 
rate of learning (communicating and presenting). 

All three models, although different in focus, attack issues centrally 
important in educational research, practice, and policy. They provide 
means to overcome nonintegrative views of the teaching and learning 
process. Their level of specification allows them to be used in em- 
pirical research and to answer questions vital to classroom teaching 
and learning. In the work reported here, we have used our model 
to link educational resources to pupils' learning opportunities. We 
will summarize the parts of the model relevant for this study and 
then illustrate the resource-opportunity link with an example—a 
very simplified one, however. 


The Harnischfeger-Wiley Model 
A gross sketch of the model has six components that fall into 
three categories (Figure 5-1). 


Background. Background includes teacher as well as pupil 
factors, such as social and home background, age, and sex; teacher 
Preparation and education; pupils’ prior achievements, motivation, 
and aptitudes. It also consists of state, community, district, and 
School characteristics, such as curricular guidelines, community 
Or district wealth, size of district or school, racial composition. 
The model only partially specifies and delimits the relevant com- 
Ponents and does not detail their causal linkages. Curriculum and 
institutional factors influence both teacher background—in the 
form of teacher selection practices, curricular guidelines, and so 
forth—and pupil background, in that districts differentially attract 
families of varying types, partially regulate school boundaries, and 
50 on. But a school or district’s pupil background composition also 
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TEACHING-LEARNING 
BACKGROUND PROCESS ACQUISITION 


Teacher 
background 


Curriculum 


ricu! Teacher 
Institutional activities 
m. 


Pupil 
background 


Pupil 
achievement 


Figure 5-1. Gross determinants of pupil achievement (Harnischfeger and 


Wiley model). 
Source: Harnischfeger and Wiley, 


1977. Reprinted with permission from the 
Journal of Curriculum Studies. 


influences institutional Structures and the curriculum. This can take 
place not only through parent activities but also through pupil 


activities as they relate to educational goals and the school's success 
in meeting them. 


The teaching-leaming process in- 


aD 


» Pupil acquisition. The teacher activites them- 
selves are influenced by all 


Acquisition. Acquisition rı 
currently considers onl 


Teaching-Learning Process 
The Structure of O 


pportunity. Pupils’ activities and their relations 
to those of the teache: 


r constitute the focus of the model. 


Learning Settings and the 
The molecular unit in the mo 
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ricular content, a particular grouping (whole class, small group, 
or individual work), and a characteristic mode of teacher supervision 
or nonsupervision. Examples include such learning activities or 
pursuits as seatwork in mathematics, reading group instruction, 
and whole class instruction in science. Additionally, pupil activities 
occur in time—that is, they begin and end and consequently have 
defined durations. The time concept allows the determination, 
Specification, and subdivision of the quantity of schooling that 
particular pupils receive. The total amount of schooling, although 
only an imperfect indicator of the substance and potency of the 
educational process, does have strong, causally interpretable relations 
to achievement (Wiley, 1973); and these relations have potent 
implications for the societal costs of education (Wiley and Har- 
nischfeger, 1974). 

A pupil's total active learning time, devoted to a specific subject 
matter or content area, X, is the time relevant for his achievement 
On X. This will be considerably less than the nominal quantity 
of Schooling set by states or districts (Figuire 5-2). The nominal 
quantity of schooling, defined through the lengths of the school 
Year and day, may be cut—for example, by teacher strikes, illnesses, 
9r bad weather conditions. This results in the actual amount of 
Schooling offered to pupils. For an individual pupil, this quantity 
Will be reduced by absences. The resulting time, which is the basis 
of a pupil's school learning, is the quantity of schooling (for a 
Particular pupil, K). This quantity is allocated to various curricular 
areas and consequently to diverse pupil pursuits. For a specified 
Curricular area, this results in the total time spent in X pursuits 
(for pupil K,) which is the key to what and how much K learns 
about subject matter X. This amount might be still further reduced 

Y the amount of time that a pupil is not actively engaged in 
learning. 


Teaching Strategy and Time Allocations. Teaching and learning 
Occur in whole class instruction, subgroups, and as individual (seat) 
Work. Teachers employ different grouping and individualization 
Strategies depending on pupil characteristics, subject area, cur- 
Пеш ит, resources, and their own preferences. Depending on a 
teacher's grouping and individualization strategies, pupils receive 
ifferential amounts of teacher time which impact on pupil achieve- 
Ment. These strategies also represent an important resource allo- 
Cation factor, because the teacher, and thus teacher time, is the 
Most expensive resource in education. 
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Our model stresses an analysis of teacher and pupil time, which 
are identical only for pure whole class instruction with no pupil 
absenses. Typically, an unfolding of pupil time reveals differential 
Pupil allocation to learning settings because pupils work in different 
groupings and receive differential amounts of teacher supervisory 
Or instructional time. The model divides the total time a pupil 
Spends on a subject matter (X) into seven learning-setting categories 
(see Figure 5-2): (1) whole class pursuits, (2) teacher-supervised 
Subgroup pursuits, (3) teacher-supervised individual pursuits, (4) un- 
Supervised subgroup pursuits, (5) unsupervised individual pursuits, 
(6) transitions, and (7) out-of-school pursuits. 

The time allocations of individual pupils to learning settings 
in curricular areas allow us to draw a time ledger for pupils’ and 
teachers’ total school time and also out-of-class learning or work- 
ing time. This allows us to study conditions of pupil learning, 
teacher effectiveness, and resource allocation. The system may 
be used for summarization and aggregation in several ways: sums 
by curricular area, grouping category, or type of teacher super- 
Vision describe aspects of the structure of learning opportunities 
Open to individual pupils. Aggregations of the complete accounting 
Or its summations over pupils characterize curricular priorities 
Or instructional schemes. Further cumulations for types of pupils 
Ог classes with specific attributes would bear upon curricular de- 
Cisions or assessments of equality of educational opportunity. 
And as we outline below, these summaries can then be interpreted 
in terms of pupil achievement, linking school policy and teacher 
education to educational goals via the model's specification of the 
teaching-leaming process. 


Pupil Achievement. We have traced conceptually an individual 
Pupil's quantity of schooling through to a specified curricular 
area, in a grouping and supervisory surround, stating relevant 
Influences on the way. Another part of the model specifies the 
Proximal causes of achievement in terms of learning time, pupil 
Characteristics, and instruction. However, as this chapter focuses 
Оп the relation between resources and pupil opportunity without 
racing Opportunity through to achievement, we will not explain 
this part of the model and will instead explore and articulate the 
Origins of learning opportunity. Specifically, this requires analysis 
of the determination of (differential) pupil exposure to learning 
Settings of various kinds by resource availability and teaching 
strategy, 
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RESOURCES, TEACHING STRATEGY, 
AND LEARNING OPPORTUNITY 


It is obvious that even a pupil who is willing to spend much effort 
in learning will not acquire knowledge if he is not allowed time to 
learn; on the other hand, it is common experience that pupils vary 
enormously in their active learning even when allowed equal amounts 
of learning time. However, we would assume that, within boundaries, 
an increase in allocated learning time would increase the amount 
of active learning time and thus achievement. In other words, if 
we extend schooling, we expect more achievement. But we could 
also focus on the relative amounts of time that pupils spend actively 
learning, and we could increase those amounts of active learning 
time during schooling. We would thus increase achievement by 
intensifying schooling and the learning experiences within it. Both 
aspects of school leaming and achievement—the extensive and the 


intensive—are vitally important in allocating resources, and both 
means are used to increase achievement. 


Extensity of educational activities can be simply and importantly 
quantified as their duration— 


number of school years, length of 
school years and school days, subject matter allocations, and in- 
dividual absence rates. On the other hand, intensity of schooling 
might be described by the density of teacher-pupil relations, 
materials, equipment, and other school-related personnel. Thus, 
small classrooms and those with teacher aides or specialists, plentiful 


books and curricular materials, and ample equipment may be charac- 
terized as educatively intensive, regardless of the total hours of 
instruction in the school day or year. 

As the extent or duration of educational experiences and thus 
the total amounts of pupils’ active learning time increase, the quan- 
tities of resources consumed increase concomitantly. As pupils 
spend more years, months days, or hours in school, their total 


increases, but so do expenditures 


ae : Is, and school facilities. In- 
tensifying schooling by decreasing class si 


There is much controversy over the consequences of such changes 
for achievement. The controversy is exacerbated because educa- 
tional research on school effects has not conceptually differentiated 
extensive from intensive parts of schooling. Investigation has con- 
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founded intensive aspects, such as teacher characteristics, with 
mixtures of extensive and intensive, such as library volumes and 
Staff expenditures, and has ignored or miscategorized the extensive 
aspect. Consequently, policy decisions concerning extensive or 
intensive aspects of resource allocations have been formed on dark 
and slippery grounds. 

However, scanning the past decade, we find that additional re- 
Sources were used mostly to intensify schooling and only relatively 
Small amounts were used to extend it, usually by offering summer 
School sessions. The largest amounts of additional resources during 
this time period were provided by the federal government under 
the Elementary and Secondary Education Act of 1965, intended 
to overcome educational disadvantages caused by poverty. The 
resources expended were overwhelmingly used to increase the 
numbers of teaching personnel per pupil—that is, to decrease class 
Size; to hire more teachers, specialists, and teacher aides; to recruit 
more volunteers; or to intensify schooling through developing 
materials meant to be more effective. The reduction of the ratio 
Of pupils to instructional personnel is based on the conviction 
that the more pupils can interact with teachers, the greater their 
achievements will be; presumably because these interactions pro- 
mote additional active learning. Teacher specialists were employed, 
Mostly in reading, on the basis that specific learning problems 
Could be more effectively and efficiently overcome by teachers 
With special skills for treating a defined problem area. 

This increase in classroom resources has no direct effect on pupil 
achievement, only indirect effects through pupil activities that in 
elementary school, are primarily determined by the teacher. In 
this chapter, therefore, we will focus on how the teacher makes use 
Of these resources. 


Personnel Resources and Their Relation to 

Teaching Strategy 

In a class with both a teacher and an aide, the teacher can work 
With a small group while the rest of the class is supervised by the 
aide. A teacher without additional support personnel, on the other 
hand, must increase pupils' unsupervised time if he or she desires 
O decrease the size of instructional groups. Thus, the effects on 
Pupils of grouping and individualization decisions strongly depend 
9n the availability of supporting personnel. 

f course, these effects are also dependent on class s 

number of children who regularly attend, along with the time com- 
mitments of teaching personnel, determine the normal ratio of 


ss size. The 
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children to adults in the classroom. This ratio is the key factor 
that links instructional strategy to pupil experiences. 

To flesh out these notions in a fashion that makes clear the over- 
all interdependency of teaching strategy, resources, and pupil ex- 
periences, we have summarized some data bearing on these issues 
(Table 5-1). The personnel figures in the table reflect the total 
time commitments of persons in each teaching category. Thus 
the 1.11 teachers, if attributed to a single class, could have been 
one full-time teacher plus a specialist who works with pupils in 
that classroom for 11 percent of the weekly instructional period.? 
The teacher resources in these classes averaged slightly in excess 
of one full-time person. This is mostly due to specialists who visited 
the classroom and provided specific instruction for short periods of 
time. However, aide resources constituted only about 36 percent of 
a full-time equivalent. This comes about because about two-thirds 
of the classes had aides and, on the average, they worked just over 
half-time. Volunteer efforts were even more severly limited, volun- 


Table 5-1. Personnel Resources and Tim 


e Distributions to Groupings 
(thirty-five first grade classes). 


Average Percentage of Time Spent in Particular 
Instructional Settings by: 


Pupils Adults Teachers Aides Volunteers 


Without pupils or adults 30.85 14.61 8.14 28.32 12.47 
Tutorial with 1 pupil .59 7.85 7.06 8.75 17.37 
Tutorial with 2 pupils 


Group size 3-8 
Group size ^8 


Average number of full-time 

equivalent personnel 

for all classes 1.64 1.11 .36 41 
Number of full-time 

equivalents in classes 

with such personnel 111. .55 .26 
Percentage of classes 

with personnel 


.52 3.54 2.06 3.86 16.28 
10.56 24.83 23.74 29.83 10.81 
57.48 49.16 59.00 29.24 43.07 


100.0 65.7 68.6 
Number of classes = 35 
Mean gorup size: 
Small (3-8) 6.30 
Large (>8) 17.33 
Average number of enrolled pupils 26.07 
Average number of at; 


Average child-adult ratio 


tending pupils = 24.30 
Average length of school day = 


Source: Devault, Harnischfeger, and Wiley, 1977a. 
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teers being present only about 17 percent of the time. Again, they 
were present in only about two-thirds of the classes, while their 
time commitment, when they were available to pupils, was only 
about half that of the aides. 

These personnel were used to fulfill somewhat differentiated 
functions. The teachers spent most (59 percent) of their time with 
large groups and about one-fourth with small ones, devoting rela- 
tively small amounts to tutorial instruction and activities not directly 
instructional. The aides spend much less of their time with large 
groups (29 percent), somewhat more than the teacher with small 
groups (30 percent), and considerably more in noninstructional 
activities (28 percent). The volunteers spend most of their time 
with children (88 percent), concentrating on individuals and pairs 
(34 percent), but also with an emphasis on large groups (43 per- 
cent). All in all, adults spend almost half their time in large groups, 
One-quarter in small groups, 11 percent in tutorial or two child 
Settings, and the rest (15 percent) without children. 

These allocations probably flow from individual preferences 
and assessments about the effectiveness of teaching settings, but 
Severe constraints on these decisions follow from restrictions on 
the personnel resources available. The average number of pupils 
enrolled in these classes was just over twenty-six, which, given 
a daily absence rate of about 7 percent, means that teaching per- 
Sonnel deal with twenty-four to twenty-five different pupils during 
а typical school day. The effective pupil-adult ratio is, therefore, 
almost fifteen to one. This ratio constrains the average amounts 
of time that pupils spend in groups of various sizes. 


The Determination of Pupil Experiences 

Given a teaching strategy and a configuration of classroom 
resources, a logical question concerns the consequences for pu- 
pils: How do these factors influence their educational exper- 
lences? 

Suppose, for example, that in a class of twenty-five pupils, a 
teacher and his or her aide spend 10 percent and 25 percent of 
their time, respectively, tutoring pupils. Since tutorial instruction 
Mvolves only one pupil at a time, a typical (i.e., average) pupil 
will be tutored 10/25 = 0.4 percent of the time by the teacher 
and 25/25 = 1.0 percent by the aide. As this example shows, there 
15 a strict defining relation between the adult instructional time 
available and the grouping context within which it is used, on the 
Опе side, and the average amounts of pupil exposure to instruction 
'n that context. on the other. 
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The relationships between pupil and teaching time allocations 
for other grouping contexts are as strict and can be generally 
stated for groups of all sizes.? Generally, the proportionate pupil 
time allocation for an instructional context is equal to the pro- 
duct of three factors: (1) the average group size, (2) the personnel 
intensity ratio (full-time equivalent teaching personnel per pupil), 
and (3) the proportion of instructional time devoted to the con- 

xt. 

à To illustrate, suppose that in one classroom, 40 percent of all 
adult time is spent in small groups of six children. Also, suppose 
that the class is heavily resourced—for this example we will assume 
twenty children taught by a teacher, two aides, and a volunteer— 
so that the adult-child ratio is 4/20 — 0.2. Then the percentage 


of pupil time spent in such small groups would be (40) (6) (0.2) 
— 48 percent. 


Returning to Table 5-1 
instruction occupies almos 
ratio of 14.8 implies tha 
during less than six-tenths of 0.06 


over 30 percent of the time. 

It is difficult to see ho 
could materially improve th 
were entirely eliminated 


W a revision of the teaching strategy 
е situation. If one- and two-pupil settings 


influence pupils? educational experiences. 
Thus, since group sizes and the proportion of teaching time 
devoted to them are determined by teaching strategy and since 


the personnel ratio indexes the intensity of the resources allo- 
cated, instructional Strategy and re 
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TRACING PERSONNEL RESOURCES TO 
THE PUPIL: AN EXAMPLE 


School administrations allocate a defined amount of instructional 
resources to elementary school classes or groups of pupils. These 
resources consist minimally of a teacher, a classroom, curricular 
material, and equipment such as furniture and perhaps a piano 
Or overhead projector; but these resources might be augmented 
by time commitments of other personnel such as aides, volunteers, 
9r specialists, as well as by special resources for field trips. The 
classroom teacher, whose teaching time is the major resource, con- 
trols to a large degree the allocation of these additional resources. 
The teacher groups pupils; allots curricular materials and assigns 
tasks; parcels out his or her time to the whole class, groups, and 
individual pupils; allocates the working times of aides or volunteers, 
if available, to groups and individuals or to materials preparation; 
Plans field trips; and prepares instructional materials, such as work 
Sheets and the like. Thus, the major portion of educational resources 
are directly allocated to pupils through a teacher's teaching strat- 
egies and their implementation. Р 

A teacher interacting with a pupil has assigned that pupil a part 
of his or her time, and the school district has devoted a part of 
its monetary resources to that child via the teacher's salary. A 
teacher devoting more time to some pupils than to others has, 
ш effect, differentially allocated resources. In the example below, 
We attempt to trace this resource allocation from school district 
budgetary decisions through to the dollar value of resources re- 
ceived by individual pupils. 

In order to accomplish this, logically we must: 


• Divide the school district expenditures between those that are 
used to purchase items and services that directly affect the pupil 
(e.g., materials, teacher time) and those used for indirect services 
(administrative time, janitorial services); | 
Follow those expenditures down to school and classroom; 
Differentiate teacher time with respect to the amount used for 
Managerial tasks; ЕРЕ Tam 

Ollow a teacher's grouping and individualization stra 8 | 
Assess resource allocations with respect to teacher and aide 
time, as well as materials, down to the level of the individual 
Pupil in a classroom; fae 
Disentangle teacher time from pupil time; 
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e Trace individual pupils' actual instructional time over the school 
day in all areas taught; and 

e Estimate the dollar expenditure per individual (not average) 
pupil in a classroom. 


In this chapter we will accomplish only some of these tasks, 
and this will be done by means of an extremely simplified example. 
However, we hope that this first attempt at tracing school district 
and classroom resources to individual pupils? learning opportunities 
will open a new avenue for evaluation and accountability. 

We have constructed our example so that (personnel) dollar 
resources are followed down to individual pupils. We have invented 
an elementary school district containing two schools, twelve classes, 


and 300 pupils. We have linked the data for this hypothetical school 
district to real time allocation dat: 

Organization 
ample district 
district has o 


grade level. These classes have average enrollments of twenty-five 
pupils and are staffed by classroom teachers paid an average of 
$15,000 per 


support, and 40.6 percent is devoted to administrative, clerical, 
custodial, and health Services. 
Our example district is r 


| ather wealthy. This becomes more ob- 
vious when we translate th 


» 


If we now trace costs to the two example classrooms,’ then 
we assume, to simplify, that the district’s twelve classes share equally 
all personnel expenditures for administrative and noninstructional 
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Table 5-2. A Small Elementary School District and Its Personnel 
Expenditure. 


I. Demographic characteristics 


Enrollment: 300 pupils 
Number of schools: 2 
Grade span: 1-6 
Number of classes: 12 (1 per grade in each school) 
Average class size: 25 
П. Personnel Expenditure 
Central Office Staff 
Administrative and noninstructional support staff: 
1 Superintendent $ 28,000 
1 Business manager 24,000 
1 Administrative intern 10,000 
1 Nurse 14,000 
1 Secretary 9,000 
$ 85,000 
Instructional support staff: 
1 Curriculum coordinator $ 23,000 
$ 23,000 
Specialist teachers: 
1 Specialist in reading/language $ 18,000 
1 Specialist in art 14,500 a 
$ 32,500 
School personnel 
ministrative and noninstructional support staff: 
2 Principals 6 $22,000 а 
2 Secretaries @ $7,500 18.000 
2 Custodians 9 $8,500 _ 170009 жабу ЭЗ: 
$ 76,000 
Teaching staff: 
12 Classroom teachers 
€ $15,000 (average) $180000 — . ——_ 
$ 180,000 
$ 396,500 


Д District's total personnel expenditure 


aS well as districtwide instructional support staff. These costs bind 
-4 percent of total personnel expenditures, or $613 per ent 
OWever, district classes vary in classroom teacher salaries as we 
25H Specialist teacher allocations. Our classrooms do not share 
the Specialist teachers. These pupils consequently do not. dis 
апу of the average $108 per pupil for specialist instruction. On 
the other hand the teachers of our example classes (grade 1 classes) 
Teceive far above the district’s average teacher salary because a their 
ong teaching experience. Thus, the average per pupil expen aes 
SE Classroom teacher time in these two classrooms 1s higher than 
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Table 5-3. Personnel Expenditure for Instructional and Other Staff. 


Percent of 
Total Expenditure 
Personnel expenditure for direct instruction 
Specialist teachers $ 32.500 
Classroom teachers 180,000 
$212,500 53.6 
Personnel expenditure for instructional support 
$ 23,000 
$ 23,000 5.8 
Personnel expenditure for administrative and 
noninstructional support staff 
Central office $ 85,000 
Schools 76,000 
$161,000 40.6 
District's total personnel expenditure 
$396,500 100.0 


the district average (Table 5-5). Thus, although the classes do not 


receive any specialist time, the per pupil expenditure for direct 
instruction is 10 percent higher in class 1 and 6 percent higher 
in class 2 than the district average. We want to keep these district 


Table 5-4. Per Pupil Expenditure for Instructional and Other Staff. 


Percent of Total 


Dollars per Pupil Expenditure 
Personnel expenditure for direct 
instruction 
Specialist teachers 108 8.2 
Classroom teachers 600 45.4 
708 53.6 
Personnel expenditure for 
instructional support 17 5.8 
77 5.8 


Personnel expenditure for administrative 
and noninstructional support staff 
Central office 283 
Schools 253 


536 40.6 
District’s per pupil total personnel 
expenditure 1,321 100.0 


| 
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Table 5-5. Per (Enrolled) Pupil Staff Expenditure for Example Classes. 


Percent of Percent of 
Class 1 Total Class 2 Total 
Class Size (Enrollment) 27 28 
Personnel expenditure for 
direct instruction 
Specialist teachers 0 0 
Classroom teachers 
($21,000) 778 750 
778 55.9 750 55.0 
Other personnel 
expenditure 613 44.1 613 45.0 
Total per pupil personnel 
expenditure 1,391 100.0 1,863 100.0 


and classroom level averages in mind when we now trace instruc- 
tional costs to individual pupils in these two classes. In order to do 
this, we analyze the amounts of teacher time that individual pupils 
receive and attach dollar figures to them. 
In school 1, the day is five hours and twenty minutes long. School 
2 has a longer day: six hours and five minutes (Figure 5-3). This 
discrepancy results from the fact that the pupils in school 2 go home 
for lunch while those in school 1 have lunch in school. Lunch time 
for school 1 is accordingly shorter. The scheduled times for instruc- 
tion are more uniform—four hours and thirty minutes versus four 
hours and twenty-five minutes. However, scheduled instructional 
time is not necessarily equal to actual instructional time (Table 5-6). 
The class in the first school used about thirty-two minutes more 
han the scheduled time for breaks (lunch, recess, and toilet), cutting 
actual instructional time down to 238 minutes. The class in the 
Second school actually extended the school day by 3 minutes and 
Used 2 minutes less than the scheduled lunch time to extend instruc- 
Чоп by 5 minutes beyond the 265 minute schedule. From these 
and other data, it became clear that school 1 has unrealistically 
Scheduled too little time for breaks. " 
eachers i d the avai А 
differently kn Бату qiie first teacher devoted ninety-four 


Minutes (40 percent) to pupil contact and seventy-two minutes 


pen -toring seatwork and for classroom manage- 
еа The second teacher, on the 


ont and transiti ivities. 
ansitions between activitie c О 
оер і 5 percent) of the time directly 
еа А re pupil contact time than 


aching pupi is i ent mo 
pils. This is 86 perc 
the first teacher spent, a result that came about because of closer 


able instructional times quite 
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Table 5-7. Teacher and Pupil Time. 


School/Class 1 School/Class 2 
Percent of Total Percent of Total 
Minutes In Class Time Minutes In Class Time 

Teacher 

Pupil contact 94 39.5 175 64.8 

Monitoring 72 30.3 70 25.9 

Managerial-transitional "2 30.3 25 9.3 
Average Pupil 

Teacher contact 31 13.0 124 45.9 

Seatwork 117 49.2 128 47.4 

Managerial-transitional 90 37.8 18 6.7 
Total In Class Time 238 100.0 270 100.0 


correspondence between scheduled and actual break time and less 
time spent in management tasks and transitions. 

In addition, because of differences in teaching strategy—that 
is, time allocations to supervised group settings with varying numbers 
of participants—this 80 percent difference in teacher time trans- 
lates to almost a fourfold difference (31 versus 114 minutes) in 
teacher contact time for pupils. Thus even though pupils spent 
similar amounts of time in seatwork (117 versus 128 minutes or 
about 50 percent of their time), time on actual instructional tasks 
(including seatwork) was 64 percent greater in class 2 than it was 
in class 1 (242 versus 148 minutes). 

, The two teachers followed quite different grouping and indi- 
vidualization Strategies (Table 5-8). In class 2, a third of the time 
15 spent in whole class instruction as compared with only 6 percent 
Or 7 percent in class 1, depending on whether we focus on teacher 
or average pupil time. The teachers used subgroup instruction to 
about the same extent.5 But the teacher in class 1 spent considerably 


more time tutoring and monitoring, which resulted in, relatively, 
much more average seatwork time for that class. 
But within each cla: 


In class 15 the pupil who receives minimum teacher time spends 
ien minutes in whole class instruction and receives six minutes 
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Table 5-8. Teachers' Grouping and Individualization Strategies. 


Teacher Time Mean Pupil Time 
Type of Percent Percent 
Instruction Minutes of Total Minutes of Total 
School/Class 1 
Whole class 10 6 10 7 
Subgroup 58 35 19 13 
Tutorial 26 16 1 1 
Monitoring 72 43 = = 
Seatwork — — 117 79 
Total time 168 100 148 100 
School/Class 2 
Whole class 82 33 82 32 
Subgroup 86 36 42 17 
Tutorial 7 3 a b 
onitorial 70 28 == = 
eatwork — — 128 51 
Total time 245 100 252 100 


a 
Less than 0.5 minutes. 
b 
Less than 0.5 percent. 


9f teacher instruction in a subgroup. That is, over the whole school 
day » this pupil spends sixteen minutes in settings where the teacher 
5 instructing and spends all the other instructional time in seat- 
Work. If we take group size into account and dilute the teacher- 
Pupil contact time proportionately by the size of the group, then 
5 Pupil received, on the day of observation, a teacher resource 
equivalent of two and one-half minutes, only a little more than 
the average pupil resource allocation, which amounted to 
four and one-half minutes. However, the pupil with maximum 
teacher contact time received six minutes, more than double the 
amount of the pupil with minimum teacher contact. This Шы 
. Sue to more subgroup instruction for the "maximum а. 
Since an Pupils share equally the whole class instruction, an 


Pupil did not receive an tutoring. ini 
e picture in class 2 is quite similar, in relative terms. Нр 
mum Pupil received a little more than half the teacher SK a re 
9f the “average pupil,” and the “maximum pupil rin a ee 
Ап twice that of the “minimum pupil.” However, the en a 
pation in minutes is considerably greater in class 2 bre pete 
оесацве the teacher, on the whole, spends considerably more 


in direct pupil dustruction.. It ds apparent that pem strategy 
ag ously affects teacher-pupil contact time for the c 
Well as for individual pupils. 


s as a whole 
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In order to complete our resource flow exposition, we will now 
translate teacher time allocations to pupils into dollar figures. Aside 
from the assumption that our observations are typical, this will 
be done by specifying that teacher salaries are paid only for the 
typical 180 day school year and only for the scheduled instruc- 
tional time. We will determine the teacher cost bound in direct 
teacher-pupil interaction. The importance of the argument resides 
in the relationships of per pupil teacher cost compared within and 
between classrooms and in the relationship of teacher cost bound 
in direct teacher-pupil contact as compared with costs bound in 
Seatwork and in managerial and monitoring activities. Both of 
these relationships are heavily influenced by teaching strategies. 

We have stated earlier that the per pupil teacher costs in our 
example classrooms were higher than the district average because 
the teachers of these classes have considerably greater teaching 
experience than the district's teachers on the average. This is of 
Course also reflected in a teacher's per minute cost (Table 5-10).’ 
We will now analyze how much of these costs are bound in direct 
teacher-pupil contact time. 

On the average, pupils in class 1 receive less than half of the 
teacher resources (45 percent) in direct teacher contact, while 
Pupils in class 2 receive, on the average, 85 percent of teacher re- 
Sources in direct instruction. If we assume that direct teacher in- 
Struction has more impact on pupil learning than seatwork does, 
We would like to redirect the teaching strategy used in class 1. 

If we go to the individual pupil level and from there to the ex- 
tremes, we find that pupils in class 1 can receive as little as one- 
fourth of an average pupil's teacher cost in direct teacher contact 
While others might receive up to 60 percent of such resources in 
that form. The range is even more pronounced in class 2, where 
Pupils might receive between 44 and 117 percent of average per 
Pupil teacher resources in direct instruction. It is important to 
note that the pupil with the smallest direct teacher resource al- 
cation in class 2 receives about as muc 
Class 1. 

This differential in teacher resource allocation to pupils is a 
Consequence of the classroom teacher's teaching strategies. It is 
Dot an outflow of curricular differences, since both first grade 


Classrooms were working with very similar content emphases, 
identical reading materials. Which strategy 


e? We have not traced the teachers’ 
ievement, but, in general, it is 
teacher-pupil contact results in 


h as the average pupil in 


Using, for example, 
can be considered more effectiv! 
Instruction through to pupil ach 
Justifiable to believe that more 
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m : : 
i Merge This has been the basic rationale for smaller class 
аы 4 а pue Thus if this assumption holds, the 
s oes i i s 
beri uas 4 es not allocate her time as effectively as the 


RESOURCE LIMITS AND PUPIL OPPORTUNITY 


I — 
e simplified fashion, we have analyzed the powerful role 
interius rn resources in constraining and delimiting the 
mine кү ecisions made by teachers. These constraints deter- 
thet thes average resource allocation per pupil and limit the forms 
offs RS allocations can take. They force teachers into trade- 
largo геа іпёепѕе contact with few pupils accompanied by 
Aj s зрна of unsupervised time for many, on the one hand, 
Neon топ Истер for all, on the other. Tracing these 
Шела о individual pupils has highlighted the vast variations 
net à onal oppoutunity accompanying these resource-constrained 
e chonal decisions. 
dede DA teaching resources are scarce, but additionally, some 
Within p many more than others. We cannot but conclude that, 
these ay ementary classrooms, the teacher is the major cause of 
hona s differences in pupil opportunity and use of resources. 
Seem Sy Pre are fundamentally disturbing to us. Some classrooms 
others act e on the verge of being expensive depositories, while 
Pupils ctually serve to provide education. The reasons why some 
iii might actually receive very little teacher instruction must 
О posed. 
id concern for pupil learning, 
= that do not suppo 
ation at to open the classroom d 
e kam Teacher education and con 
ehind те if the actual classroom processes con 
at door. 


especially for pupils who live 
rt school type learning, makes 
oor if we are to improve ed- 
tinuing teacher training will 
tinue to be locked 


in 
it 


App 

PRCENDIX A: RESOURCE AND GROUPING 

EVAL UES FOR FOLLOW-THROUGH 

OF VATION CLASSES—DESCRIPTION 
ARIABLES 


Th 

ж w the percentage of time 
Settin 
One 


58 distributions reported here sho 
eachers, aides, volunteers, and pupils spend in each group 
8. Instructional groupings include tutorial instruction with 
Ог two pupils, group instruction. with three to eight pupils, 
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and group instruction with more than eight pupils. The amount 
of time that adults spend without children or that pupils spend 
in unsupervised activities (without adults) is also reported. 

The time percentages were calculated against the base of the 
total number of observations made in a class, site, or curriculum. 
If, for example, sixty observations were made in a particular class, 
perhaps one teacher was observed during fifty-four of these, while 
two teachers were observed in the remaining six. This would result 
in sixty-six teacher “occurrences” and an estimated full-time equiv- 
alent teacher staff of 66/60 = 1.10. If thirty-three of these “ос- 
currences" were in small groups, the corresponding teacher time 
allocation was estimated as 33/66 or 50 percent. Full-time equiva- 
lences for aides and volunteers were estimated by computations 
parallel to those for the teacher staff and the corresponding time 
distribution estimated in a manner similar to that for teacher time 
allocations. 

The “number of pupils" was calculated in a manner similar to 
that used to obtain the full-time equivalent personnel values. Thus, 
the “occurrences” of pupils were divided by the “number of ob- 
servations" to yield the number of pupils who were present in 
the classroom at the times when observations were made. Note 
that this figure corresponds to attendance rather than to enroll- 
ment. The pupil time allocations to grouping settings were also 
estimated from the “occurrences” of pupils in groups of various 
sizes. (See Appendix B for a discussion of the logic of these pro- 
cedures.) Ratios of the pupil figure and those for the adults formed 
the “‘pupil-adult ratios." These values are based on those reported 
in SRI (1974). 

We have also reported means of the actual “enrollment,” “а 
tendance," and “class duration" values recorded by the classroom 
observers at the beginning of each day of observation. 


APPENDIX B: CONVERSION OF 
INSTRUCTIONAL TO PUPIL TIME 


The amount of teaching time devoted to a particular grouping 
context in which an adult participates, 7, can be summarized as 
the product of three factors: (1) The number of instructional per- 
sonnel in the classroom, 7; (2) the proportion of teaching time 
devoted to the specific context, y; and (3) the daily duration of 
instruction, 6—that is, т = пуб. The aggregate daily time that pupils 
are exposed to instruction in that grouping context, o, is then 
equal to the product of the group size, к, and the instructional 
time offered, 7—namely, a = кт. As the average pupil in a class of 
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Size N is proportionately exposed to only one part out of N of the 
aggregate time, the average daily pupil exposure time in a grouping 
context is 0 (= a/N) = кт /N. As the daily exposure time can also 
be written as the product of the proportionate pupil exposure 
time in the context, A, and the daily duration of instruction, 6— 
that is, as 0 = A6—this implies that 


A8 =0=ky/N 


к (пуб)/М 


(s) «m5. 


| 


that is, that 
E d: 
= (9) us 


That is, the proportion of pupil time spent in a grouping context 
equals the product of (1) the personnel intensity ratio, n/N, (2) the 
Бтопр size, and (3) the proportion of instructional time devoted 
to that context. 

If we write the intensity ratio as p and index the supervised 
Етопрте contexts with the size of the group, then 


ль = Kerr - 


If we index unsupervised time for pupils and time without chiken 
for adults with 0, then 


X "E (5.1) 
K=0 
=> Круг = p E Кук 
K=1 Kel 
N 
= рУ Kýr (5.2) 
-0 
= pH, and 
N 
Ao =1— EA,-1-— Pu, (5.3) 
K=1 


Where ц is the mean group size. 
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We can compute the proportion of pupil time in a range of group 
sizes (say l4, to l5) as 


12 12 12 
ZA,—-ZEK =p È Ky. 
kea k kn PYR P e Yr 


The ratio of pupil to teaching time in this range is 


the mean group size for groups having between /, and l, pupils. 
Therefore, denoting 


12 
2 Ак by (А , 15), and 
к= 


12 
У y, by y (h, I2), then 
K= 


à (lh, l2) = p [u(h,l2)] [v (h, l2)]. (5.4) 


This average result holds for any contextual partitioning of instruc- 
tional and pupil time, which can be described by group size distri- 
butions, not merely those defined by group sizes. 


APPENDIX C: TWO SAMPLE FIRST 
GRADE CLASSROOMS 


A Monday in Mrs. Sitner's First Grade Class 


School. The school serves grades K-8. It has a teaching staff 
of about forty-five and an enrollment of about 900 pupils, all of 
whom are black. The school received ESEA Title I funds during 
the observation year, since 44.7 percent of its pupils come from 
low income families. The principal estimated that about 70 percent 
of the pupils live in single parent families. The school operates 
on a “closed campus" basis—that is, instruction begins at 9:10 


nh 
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A.M., and pupils remain in the building with their teachers until 
dismissal at 14:30 P.M. All pupils receive a free lunch at school, 
and about half of the pupils arrive early to participate in a free 
breakfast program. The school building is very old and in poor 
condition. 


Teacher. Mrs. Sitner has been teaching at this school for seven- 
teen years. 


Class. Enrollment in fall thirty-one, now twenty-seven. During 
the school year (observations were performed in May), twelve 
Pupils have transferred out and eight have transferred into the 
class. One pupil (16), although enrolled, attended school for only 
one day (November 3) and then returned on March 15. He had not 
been ill. The class is the one with the lowest achievers out of three 
first grade classes in this school. 


Absence. On Mondays and Fridays typically five to six pupils 


are absent, on Tuesday, Wednesdays, and Thursdays typically two to 
three, 


Aide Support. An aide is allocated to the class for one hour 
Per day. However, the teacher cannot count on the aide for a specific 
time, á 


Curricular Material. Reading—Distar (SRA). 


Grouping. Teacher groups only for reading instruction. She 
as three groups for Distar reading. 


Weekly Schedule. 


Gym Wednesdays 13:00 - кы 
Music Thursdays 12:15 - = 
Enrichment Fridays 13:00 - 14: 


i em- 
Enrichment includes Spanish, sewing, art, French, dancing, 


"oidery, Interest groups are formed. 


Daily Schedule. 
Recess 
Lunch 
2 toilet breaks approximately 


10:30 - 10:45 
11:00 - 11:25 
5 minutes long 
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Classroom Atmosphere. Classroom is disorderly. Pupils are 
unusually quiet and lethargic. 


A Monday in Mrs. Sitner's First Grade Class 


9:10 
9:15 


Bell. Teacher collects money for field trip. 

Seatwork, materials are passed out (paper). Writing as- 
signment: copy from blackboard words and story. If 
finished they should work on mathematics problems, 
also on blackboard. 

Reading Group I (GI) is called to the reading area (9 
pupils, highest group). 

Start GI read aloud whole group. Distar program: “A 
dog was in the park. . . . I live on a ship : . ." Stories 
130, 131, 132 (preparation). 

End of GI. Pupils return to their seats. 

Reading Group II (GII) is called (3 pupils, middle group). 
Start GII read aloud whole group. Story 103. 

End of GII. Pupils return to their seats. 

Reading Group III (GIII) is called (7 pupils, lowest 
group). 

Start GIII. Sounding words out. 

End of СШ. Pupils return to their seats. Two pupils 
(3) and (11) usuall GII were not called for reading. 
They sleep in their seats. 

Toilet recess. Girls line up; boys line up. 

Back from toilet recess. Teacher handles some adminis- 
trative issue with woman who entered the class. One girl 
complains of neckaches. 

Teacher checks writing seatwork and gives individual 
help. 

Recess. 

Back from recess; pupils rest. 

Whole class. Read on blackboard days of the week and 
story writing on blackboard. Story: Today is Monday, 
May 16, 1977. It is a warm day. The sun is shining. The 
flowers are blooming. The leaves on the trees are green. 
Lunch. Girls line up; boys line up. 

Back from lunch. Whole class seatwork: number copying 
and story copying off blackboard. Teacher checks and 
helps individual pupils. 

Pupils (3) and (11) are called. Reading. They were absent 
over a longer time and receive individual reading in- 
struction. 


11:58 
12:00 
12:05 
12:24 
12:25 
12:26 
12:81 
12:35 
12:45 
12:50 
13:10 
13:12 
18:29 
13:23 
13:35 
13:37 


13:45 


14:12 


14:24 
14:30 
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Start reading (3) and (11). 

End of reading (3) and (11). 

Pupils line up at teacher's desk to deliver their mathe- 
matics seatwork. Teacher corrects it. 

End of mathematics correcting. 

Physical exercises, whole class. 

End of exercises. Teacher collects pupils’ written work 
while pupils stay in their seats. 

GI gets books “Pets and People." Read silently as seat- 
work. 

GII and (3) (GII) and (10) (GIII). Seatwork sentence 
completion. 

(16) and (GII) worksheet for seatwork. Task: detect 
differences in words and underline. 

Worksheet 3 is given to GII: 2, 3, 8, and 13; and GIII: 
10, 14. These six pupils sit down on special table with 
worksheet and teacher. Task: describe pictures. 

End of work with worksheet 3. 

GI is called. GI starts reading in reading book (“Pets 
and People") about “Gus, the dog." 

End of GI. 

Toilet recess. Girls in line; boys in line. 

Back from toilet recess. 

Whole class seatwork. Teacher checks some pupils' 
work. 

Whole class. Teacher hands out permission forms for 
field trip. Pupils have to write in room number (111). 
Leave: 9:00. Day: Wednesday (is copied from black- 
board). Date: May 25, 1977. Place: Lincoln Park. Lunch: 
at school. Cost: $1.00. 

End of form writing. Teacher collects work from pupils 
and fills out form for one pupil. Also checks again on 
who already paid for field trip. 

Whole class, clean up desks. 

End of school day. 


A Thursday in Miss Hernandez's First Grade Class 


School. The school is a branch elementary school that serves only 
&rades K-3, with an enrollment of 670 pupils and about twenty-five 
teachers, Approximately 95 percent of the pupils are Spanish, and 
35-40 percent speak minimal English when they enter kindergarten. 
The pupils’ families are primarily blue collar workers, with relatively 
few families on welfare. This school operates on an “ореп campus" 
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Board 


Bookshelves 


Blackboard 


Cloak Room 


m = male 
f = female 

D and A = observers 

Roman numerals denote reading group assignment. 


All pupils are black. 


Figure 5-4. Mrs. Sitner's Classroom, First Grade (5/16/77, 5/17/77). 
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basis; instruction is scheduled from 9:10 A.M. to 15:15 P.M., with 
a lunch break from noon until 13:00. About half of the children 
go home for lunch. A continuous first through third grade bilingual 
program is provided for some pupils. The school was built in 1973 
and is completely carpeted and air conditioned. 


Teacher. Spanish, twenty years experience (four to five years 
at this school). Has always taught in inner city. 


Class. Enrollment: twenty-eight, but only twenty-five attend 
regularly. One has dropped out, one has moved to Texas, and one 
is in Mexico. Only one pupil has officially transferred during the 
School year (observations were performed in June). Pupils were 
chosen for this class because all of them were speaking only Spanish 
at the beginning of school in September. Most children had not 
gone to kindergarten. Pupils thus range widely in ability, but teacher 
says this is not a problem, that they are "compatible." She feels 
that, because of their language difficulty, the brighter ones haven't 
"discovered their potential" yet. Next year three eight year olds 
will go to a second-third grade class. Pupil (23) will repeat first 
grade; the rest will be split between low and middle second grade. 
Three pupils should be in second grade (pupils 2, 9, 24) and were 
in bilingual classes before this year. Pupil (2) has been labeled mini- 
mal LD, but teacher disagrees. Pupil (23) has a congenital arm 
injury—cannot move right arm freely from shoulder. Doctors say 
the arm needs to be broken and reset, and parents may allow this 
over the summer. His right hand seems to be dominant, but he has 
been trying to write with his left. Teacher has special assignments 
for him for most of the day—he has spent most of the year working 
with manipulables, has just started writing and working on cutting 
and pasting. He reads quite well, is in КОШ. Next year he will 
repeat first grade, as he must learn to write before going on. Teacher 
reports that two pupils come into her class for reading instruction. 
Pupil (28) is from second grade—present on both days of obser- 
vation. Other pupil is from third grade. Teacher says he “doesn’t 
need to be here," reads well in Spanish, but his teacher wants him 
to come to her clas. He did not come in on either observation 
day—I don't know if he was absent or didn't come in for some 
other reason. 


Pull-out Instruction. Two groups go out three times per week 
for Distar language with language teacher. About half the class is 
involved—there is no set schedule for these classes. Teacher says 
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Distar language program is good for these pupils because it “makes 
it easy to start talking." Pupils' parents come in to talk to the 
teacher, but they don't get very involved. Teacher says this is a 
cultural difference: “School is the teacher's domain." All but four 
Pupils qualify for free lunch. A few families are on welfare, but 
usually because they have such large families. 


Absence. On days of observation: Wednesday, no pupil absent; 
Thursday, three pupils (3, 18, 26) absent. 


Aide Support. Teacher has no aide support. Says she could have 
had someone for two forty-minute periods per week, but it is not 
worth it. Teachers decided to give up their aides to allow them to 
Work in the kindergarten language program. 


Curricular Materials. Reading: Distar—they just follow the 
Program. Reading skill cards—work on skills in order of difficulty. 
Mathematics: follows skill cards. Teacher makes or collects own 
material. Three groups. Reading and mathematics take most of the 
time. Teacher also tries to do music and social studies, but this is 
"very incidental." At beginning of year, taught more, then retaught 
lesson in Spanish. Teacher feels that Distar is excellent for these 
Children becuase it is so phonetic. Can teach sounds from the very 
beginning whether pupils speak English or not. All pupils in this 
Class are reading, and this is unusual. Also, Spanish is a very phonetic 
anguage, so parents can understand the Distar program better. Says 
he program may be too slow “for some children in the suburbs. 


and mathematics instruc- 
g and three mathematics 
teacher had two reading 
ed more instruc- 


_ Grouping. Teacher groups for reading 
tion. She has three groups for Distar readin 
S'oups. At the beginning of the school year, | 
and two writing groups, because some pupils need 
tion in Spanish than others. 


Weekly Schedule. 
Library Mondays 9:00- 9:40 
Gym Wednesdays 11:15 - 11:55 
Language Arts Fridays 9:40 - 10:20 


has a set schedule—normally 


Dai he 
d rete. Tanha eue P bservation days showed con- 


Oes all reading in the morning. But о 
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siderable variation. 


Recess 10:40 - 11:00 
13:55 - 14:15 
Lunch 12:00 - 13:00 


Classroom Atmosphere. Very neat and orderly. Decorated with 
displays of pupils' work and teaching aids (letters and words that 
stick on the blackboard, charts, pictures, etc.). Very relaxed and 
friendly atmosphere. Teacher can joke with pupils, but is also firm 
and consistent. Gives very explicit directions. Appears to keep 
well informed about each individual’s activities and gives individual 
attention very effectively, checking on work or speaking to indi- 
viduals briefly but frequently. Teacher uses Spanish for two reasons: 
(1) to make sure pupils catch on and (2) for general cultural en- 
richment. 


Table 5-11. Miss Hernandez's Classroom Roster. 


Pupil Number Sex Reading Group? Mathematics Group? 
1 f I 1 
2 m I 1 
3 m II 2 
5 m 1 
6 f ш 3 
7 f II 2 
8 f II 2 
9 f I 1 

10 m ш 3 
pur m I 1 
12 f п 2 
13 m I 1 
14 f t 1 
15 m I 1. 
17 f II 2 
18 m I 1 
19 т II 2 
20 m I 2 
21 m п Е 
22 т II & III 1 
23 m ш 3 
24 m I&II 1 
25 f I 1 
26 m II 1 
27 m I 1 


?Red = I; White = II; Blue = III. 


b jets 1; Boats — 2; Trains — 3. 


Bulletin 


Blackboard 
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=> 
Air Conditioner 


zm 


zn 


Wardrobe 


m = male 
f = female 


M = observer 


All students are Spanish. 


Figure 5-6. Miss Hermandez's Classroom, First Grade (6/8/77, 6/7/77). 


A Thursday in Miss Hernandez's First Grade Class 


9:00 


Pupils come in, talk to teacher and each other. Teacher 


speaks to (1) about moving his desk. 

Pledge and “America.” Pupils hang up sweaters. 

Pupils sit at front of room. Teacher talks about weather 
in spring. Someone asks what scientists are. Teacher 
says “men who study weather, etc." Teacher has a pupil 
find the first day of summer on calendar to show the 
class, They talk about weather in summer. Teacher 
shows summer months on a list of months (written in 
both English and Spanish). Teacher asks how many 


| 


Divider Wall - Used as Bulletin Board 
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9:47 


10:36 


pupils are going to Mexico for vacation. About 10 raise 
their hands. 

Asks if any students are going to move during the summer. 
Goes over work on board: plurals. Teacher points to 
picture (3 glasses), students say "three glasses." For 
morning work, pupils are to fold paper into 8 boxes, 
then, e.g., draw 3 glasses, write *glasses." 

Goes over blends, especially ‘‘sh” and “ch.” Teacher 
says, “We people who speak Spanish have lots of trouble 
with these sounds. We think everything is ‘sh.’ ” 

Children line up. Teacher gives them picture of objects 
(e.g., a shell) They place the pictures underneath the 
sign for the correct blend (e.g., *sh"), then check each 
other. 

End of exercise on blends. Teacher explains worksheet 
on word endings. On the back is a map showing water 
and land masses in the world. Teacher says to color 
water blue, land brown. Teacher calls 7 children's names. 
Says they did very good work. They stand. Teacher 
calls names of children who didn't hand in papers. They 
go to desks and look for papers, then come to teacher's 
desk. 

Pupils return to their desks, begin seatwork assignments. 
Teacher gives worksheets to (23). 

Children whose names were called earlier go to teacher's 
desk (pupils 2, 8, 10, 13, 22, 24). Teacher talks to 
them about their work. 

Pupils return to their seats. Teacher moves several pupil's 
desks. 

End of desk moving. 

Teacher checks (23)'s work; circulates, checking on 
others; speaks to (17) in Spanish. 

Teacher calls “White Stars" (RGII: 4 boys, 5 girls). 
RGII starts, works on sounds. 

RGII reads story. Pupils at desk are all working on 
assignments. 

Pupil (1) is lying down, head on desk. (27) raises his 
hand, has finished work. Teacher signals to him to draw 
on the back of his paper. (23) whispers to (20) for 
help, no response. 

RGII ends. Teacher checks on (23). Asks to see (1) 
and (2), looks at their work until 10:39. Talks to (25) 
while they get their papers. 


10:38 
10:39 
10:40 
11:01 
11:02 


11:03 
11:04 


11:06 
11:07 


11:15 


11:30 


11:36 


11:37 


11:39 


11:40 
11:44 
11:56 
11:57 
12:55 
13:02 


13:05 


13:08 
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(19) goes to speech ([18] is supposed to, but is absent). 
All pupils put things away. 

Leave for recess. 

Return from recess. Teacher calls “Blue Stars" (ЕСШ). 
Sends (10) and (23) to get (28) from 2nd grade for 
reading. 

Only (6) is in RGIII; she reads sounds. 

(22) comes to RGIII, reads sounds. Teacher says, “That’s 
beautiful. You can paint instead of reading with us." 
Teacher reminds ‘‘White Stars" to work on their take- 
homes. (23) returns, they can't find (28)'s classroom. 
Teacher sends (9) with them. 

(19) returns from speech. 

RGIII pupils return. RGIII starts (pupils 6, 10, 23, 
and 28). (20) is reading a book. 

(2) and (11) are also reading books now. Teacher says, 
"I'm going to listen to some of you read these next 
week." (They are really reading, not just turning pages.) 
Visitor comes in (a former teacher) to show her baby 
to the class. 

Pupil (9) calls teacher, she checks on painters. (23) 
goes to seat. (28) leaves. Teacher goes over take-home 
with (6) and (10). 

(6) and (1) go to seats. Teacher calls “Red Stars" 
(RGI). 

Someone comes in with a message, talks to teacher. 
Teacher asks (9) to “ђе teacher," pupils start work 


on sounds. 

Teacher returns to RGI. 

RGI starts on story. 

RGI dismissed. Pupils put work away. 
Line up, leave for lunch. 


Pupils return from lunch. S 
Teacher talks about morning work. Goes over alpha- 


betizing. Gives papers back to (10) and p to be cor- 
rected. Asks class, “Minus is the same as?’ Take away. 
Gets painters to join class (forgot earlier). Goes es 
plural papers. Many pupils didn't pui a word in eac 
box. Teacher returns papers that aren t complete (most 


of the class). 

Pupils correct papers. Red Stars (RGI) work on take- 
homes. Teacher puts on record of a song they have 
been learning, works at her desk. 
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13:11 
13:15 
13:17 


13:19 


13:23 


13:25 


13:28 
13:30 
13:32 


13:34 
13:35 


13:47 
13:48 


13:51 


13:53 


13:55 
14:15 


14:28 


Gives (23) a writing paper. 

Hands back take-homes to be corrected. 

Puts on a record brought to school by (8); says, “In 
a few minutes we'll begin our spelling." 

Teacher circulates, checks (23)'s work. Asks (23), “Did 
you do that writing paper? Let’s see it." He goes to 
get it, erases part and starts over. Teacher forgets to 
come back to look at it. 

Take out spelling materials. Teacher says, “Some people 
are going to be in trouble because they put their spelling 
paper with yesterday’s work. If you don’t have it, you 
just can’t do spelling today.” Three pupils don’t have 
papers (6, 22, 24). 

Teacher dictates—for example, “His dad is sad.” (22) 
takes out a piece of paper. Teacher says, "There's a 
smart boy. He's going to write on a plain piece of paper. 
ГІ let you do that.” (24) takes out paper too. After 
a While, teacher tells (6) to get out paper. 

Looks at (23)'s writing; says, "Those are fine; those 
you have to do a little better." 

Teacher writes sentences on board to check. 

Checks (23)'s work. 

Put spelling away, sit in front on the floor. 

Teacher explains next Step on paintings: filling in the 
details. Teacher talks about details (eyes, hair, buttons, 
etc.); shows different kinds of shirts and sweaters worn 
by girls and boys; discusses shoes, Socks, and so on. 
End of detail discussion. 

Teacher plays record of a Story (in Spanish). Shows 
pictures to БО with story. Listen to song on other side 
of record. 

Pupils return to their seats. Teacher passes out story 
worksheets, tells Pupils she will make a book of all their 
Stories. 

Class does worksheet of rhyming words on back of 
Story sheet. 

Bell, pupils line up for recess. 

Pupils return from recess, begin stories. Some pupils 
ask her to put words on board—she writes “children,” 
"school," but for some Words, says, “that’s easy” and 
doesn’t put them on board. Teacher helps with words 
by showing pupils how to sound them out. 

Teacher talks to (23) about his work, says he did cut-and- 
paste worksheet "perfectly." Looks at coloring paper. 
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14:29 Language teacher comes to get “Red team" for language. 
Office calls. Pupil (8)'s mother is there to pick her 
up. She leaves at 2:30. 

14:31 Teacher calls Jets (Math Group I); calls (7); says to 
try this exercise. (12) wants to try too; teacher says 
she can if she knows the numbers to 100, tells (6) to 
try it too. Pupil (10) is the only pupil at his desk. 

14:36 Teacher dictates numbers; pupils write them in boxes 
on their worksheets (e.g., 92, 54). Teacher says, “Just 
for fun, let's see if you can write these" (110, 105, 
etc.). 

14:44 Pupils (6), (7), (9), and (15) go to closet to paint. 

14:48 Calls (5) (was absent in the morning). He reads aloud, 
teacher gives him a take-home. Other pupils work on 
Stories, worksheets, and the like at their desks. 

14:52 Pupil (25) says she wants to read for teacher. (During 
morning RG, teacher spoke to her about not paying 
attention.) She reads aloud until 14:55. Teacher says, 
“You’ve redeemed yourself." 

14:57 Teacher talks to (6) and (15) about an accident in the 
paint room. Painters return to seats. 

14:59  Listens to (13) read from a library book. 

15:00 Another teacher comes in to show her a pupil's work. 
(On a worksheet that said “Draw a ring around the 
right answer," the pupil drew a picture of a ring on 
one's finger. Teacher explains this to class, shows them 
the paper.) 

15:02 Music. Teacher plays piano; pupils sing. Teacher begins 
with a Spanish song. Then lets pupils choose songs. 
They choose about half English and half Spanish songs. 

15:06 Pupils return from language. 

15:15 End of singing. Teacher says to take reading books home. 

15:16 Pupils clean floor, line up. 

15:18 End of school day. 


NOTES 


1. These data were collected as part of an evaluation of project Follow- 
Through (Stallings and Kaskowitz, 1974) and characterize classes that con- 
stituted a comparison group for the Follow-Through curricula. 

2. Formal definitions of the various indices exhibited in the table and 
their rationales are given in Appendix A to this chapter. 

3. A discussion and derivation of these relations is given in Appendix B to 
this chapter. 
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4. These data were collected as a part of a larger study that has been re- 
ported elsewhere (DeVault, Harnischfeger, and Wiley, 1977b). 

5. All classroom-specific data are taken from two actual classrooms, 
described elsewhere (DeVault, Harnischfeger, and Wiley, 1977b); the base 
Observational records are presented in Appendix C to this chapter. 

6. Subgroup instruction was mostly in reading. 

7. This cost is calculated in the following way: teacher yearly salary /(Length 
of school year - length of school day). 
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